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ABSTRACT 

Pursuit  and  Evasion  problems  are  probably  tlie  most  natural 
application  of  differential  game  theory  and  have  been  treated  by  many 
authors  as  such.  Very  few  problems  of  this  class  can  be  solved 
analytically.  Fast  and  efficient  numerical  algorithm  is  needed  to 
solve  for  an  optimal  or  near  optimal  solution  of  a  realistic  pursuit 
and  evasion  differential  game. 

Some  headways  have  been  made  in  the  development  of  numerical 
algorithm  for  this  purpose.  Most  researchers,  however,  worked  under 
an  assumption  that  a  saddle  point  exists  for  their  differential  game. 
Here,  it  is  shown  via  two  examples  and  a  nonlinear  stochastic 
differential  game  that  such  is  not  the  case. 

A  first-order  algorithm  for  computing  an  optimal  control  for  each 
player,  subject  to  control  and/or  state  constraints,  is  developed  without 
the  assimptlon  of  saddle  point  existence.  It  is  shown  that  a  linear 
quadratic  differential  game  with  control  and/or  state  constraints 
generally  cannot  be  solved  analytically.  One  such  problem  is  developed 
and  solved  by  the  above  algorithm.  A  new  rationalization  is  offered 
in  formulating  a  missile  anti-missile  problem  as  a  nonlinear  stochastic 
differential  game.  The  algorithm  developed  here  together  with  a 
convergence  control  method  Introduced  by  Jarmark  is  used  to  solve 
the  missile  anti-missile  problem  with  fast  computation  time. 
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ABSTRACT 


Pursuit  and  Evasion  problems  are  probably  the  most  natural 
application  of  differential  game  theory  and  have  been  treated  by  many 
authors  as  such.  Very  few  problems  of  this  class  can  be  solved 
analytically.  Fast  and  efficient  numerical  algorithm  Is  needed  to 
solve  for  an  optimal  or  near  optimal  solution  of  a  realistic  pursuit 
and  evasion  differential  game. 

Some  headways  have  been  made  In  the  development  of  numerical 
algorithm  for  this  purpose.  Host  researchers,  however,  worked  under 
an  assumption  that  a  saddle  point  exists  for  their  differential  game. 
Here,  It  Is  shown  via  two  examples  and  a  nonlinear  stochastic 
differential  game  that  such  Is  not  the  case. 

A  first-order  algorithm  for  computing  an  optimal  control  for  each 
player,  subject  to  control  and/or  state  constraints.  Is  developed  without 
the  assumption  of  saddle  point  existence.  It  Is  shown  that  a  linear 
quadratic  differential  game  with  control  and/or  state  constraints 
generally  cannot  be  solved  analytically.  One  such  problem  Is  developed 
and  solved  by  the  above  algorithm.  A  new  rationalization  Is  offered 
in  formulating  a  missile  anti-missile  problem  as  a  nonlinear  stochastic 
differential  game.  The  algorithm  developed  here  together  with  a 
convergence  control  method  Introduced  by  Jarmark  is  used  to  solve 
the  missile  anti-missile  problem  with  fast  computation  time. 
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CHAPTER  1 


INTRODUCTION,  LITERATURE  SURVEY,  AND 

« 

SCOPE  OP  DISSERTATION 

Pursuit  and  Evasion  problems  have  been  treated  by  many 
authors  as  differential  games.  Analytically,  only  linear 
quadratic  differential  geunes  have  been  solved.  Functional 
Analysis  has  served  as  a  good  tidy  approach  to  gain  valuable 
insights  to  some  aspects  of  differential  games  theory.  How¬ 
ever,  only  the  simplest  mathematical  problems  which  represent 
very  small  or  no  resemblance  of  physical  realization  of  real 
life  has  been  solved  by  this  approach. 

Presently,  the  hope  to  solve  for  an  optimal  or  near 
optimal  solution  of  a  realistic  pursuit  and  evasion  dif¬ 
ferential  game  does  seem  to  lie  on  efficient  ntimerical 
algorithms.  To  make  this  dissertation  as  self  contained 
as  possible,  we  shall  start  off  with  a  brief  background  and 
history  of  game  theory  through  literature  survey  of  the 
gaune  theory  in  general,  narrow  down  to  the  work  done  on 
numerical  solutions  which  will  be  included  in  the  next 
chapter.  A  general  structure  of  differential  gauae  will 
then  be  formulated.  The  formulation  of  mathematical  model 
of  differential  gaune  will  be  discussed.  Lastly,  we  shall 
conclude  this  chapter  with  the  statements  and  the  signi¬ 
ficance  of  what  we  hope  to  accomplish  in  this  dissertation. 


1.1  Literature  Survey 


The  problem  of  pursuit  as  a  mathematical  model  was 
originated  in  the  fifteenth  century  by  Leonardo  da  Vinci 
according  to  Davis In  1732  Bougner  proposed  and  solved 
for  an  optimal  curve  by  which  a  vessel  moves  in  pursuing 
another  which  flees  along  a  straight  line,  supposing  that 
the  velocities  of  the  two  vessels  are  always  in  the  same 
ratio.  More  recently,  Hathaway,  Archibald,  and  Manning 
in  1921  worked  on  a  more  difficult  problem  in  which  the 
evader  moves  on  a  circle. 

During  the  seuae  year  (1921)  Emile  Borel  attempted  to 
abstract  strategic  situations  of  game  theory  into  a  mathe- 
atical  theory  of  strategy.  After  John  von  Neumann  proved 
the  Minimax  Theorem  in  1928,  the  theory  was  fimly  esta¬ 
blished.  However,  the  academic  interests  in  the  geune 
theory  did  not  catch  on  until  the  publication  in  ]944  of 
the  impressive  work  by  John  von'  Neumann  and  Oskar  Morgen- 
stern,  Theory  of  Games  and  Economic  Behavior.  The  theme 
of  this  book  pointed  out  a  new  approach  to  the  general 
problem  of  competetive  behavior  specifically  in  economics 
through  a  study  of  games  of  strategy.  It  was  soon  realized 
that  the  applications  of  the  theory  are  not  limited  only  to 
economics  but  also  could  be  applied  to  the  military,  poli¬ 
tics,  and  other  civil  organizations  as  well. 

Since  then  a  great  amount  of  research  on  game  theory 
was  published,  a  bibliography  compiled  in  1959^^^^  contains 


2 


more  than  one  thousand  entries.  It  is  therefore  impossible 
to  mention  all  these  reports.  Only  a  brief  overview  of  the 
section  of  the  field  that  is  closely  related  to  this  dis¬ 
sertation  will  be  presented  here. 

It  is  interesting  to  note  that  the  gcunes  of  pursuit 
mentioned  so  far  in  the  preceeding  paragraphs  are  one-sided 
optimal  control  problems  where  only  the  pursuers  have  free¬ 
dom  of  movement  while  the  evaders  move  on  pre  determined 
trajectories.  A  new  dimension  in  which  both  players  have 
the  freedom  to  choose  their  motions  was  added  by  Isaacs 
when  he  began  the  development  of  the  theory  of  differential 
games  at  the  Rand  Corporation  Isaacs  compiled  all  his 

results  in  a  book published  in  1965.  Ho(8)  provided  the 

control  engineers  with  the  review  of  Isaacs'  book  in  a  more 
familiar  terminology.  It  was  here  that  the  elements  of  game 
theory  was  married  to  the  theory  of  optimal  control. 

Briefly,  Isaacs  is  concerned  with  problems  with  payoff 
function. 


J(xs,u{t)  ,v(t) )  =F(x(T)  ,T)  +  L(x(t)  ,u(t)  ,v(t)  ;t)dt 


(la) 


and  dynamics 


X  =  f{x,u,v;t) 


x(0) 


(1.2) 


where  T  is  the  final  time  or  the  time  when  the  state  tra¬ 
jectory  meets  a  given  terminal  manifold.  He  assumes  that  a 
saddle  point  exist,  an  assumption  which  is  not  always  true. 
The  precise  me*.  ..  g  ^  a  saddle  point  will  be  given  in  the 


next  chapter  when  we  discuss  the  solution  of  Differential 

Games.  At  the  saddle  point,  the  payoff  function  is  called 

the  value  of  the  game  and  is  designated  by  J*(x,t).  Isaacs 

uses  what  he  called  the  Tenet  of  Transition,  a  game  theory 

equivalent  of  Bellman's  Principle  of  Optimality  which  he 

apparently  found  independently  in  fact  may  have  predated 

it  to  show  that  the  value  function  must  satisfy  his  Main 

Equation  One,  or  ME  , 

1 

*  *T 

^  J  +  min  max  [J  .£(x,u,v;t)  +  L(x,u,v;t)]  =0  (1.3) 

^t  u  v  X 

a.  * 

*  * 

In  principle,  ME  can  be  used  to  solved  for  u  =  u  (x,J  ;t) 

*  *  *^  *  *  X 

and  V  =  V  (x,J^  ;t).  u  and  v  are  then  substitued  back  into 

ME  to  give  the  Main  Equation  Two,  or  ME,, 

1  ^ 

^  +  J*  .f (x,u*,v*; t)  +  L(x,u*,v*;t)  =  0  (1.4) 

At  ^ 

This  is  a  Hamilton- Jacobi  type  equation  and  is  often  re¬ 
ferred  to  as  a  Hamilton-Jacobi-Bellman  equation  or  a  pre- 
Hamiltonian  equation  which  is  somewhat  of  an  injustice  to 
Isaacs.  These  equations  will  be  used  in  our  development  of 
a  numerical  algorithm  in  the  next  chapter. 

Isaacs  also  contributes  towards  the  sufficiency  part 

of  the  solution  of  the  game  through  his  so  called  Verifica- 

* 

tion  Theorem.  In  essence,  he  states  that  if  J  (x,t)  is  a 
unique  continuous  function  satisfying  the  main  equations 


and  the  boundary  condition  J  (x(T),T)  =  F(x(T),T),  then  J 

*  * 

is  the  value  of  the  game  and  any  u  and  v  which  satisfy 
M£2  and  caused  the  desired  end  points  to  be  reached  are 
optimal.  He  proves  this  theory  as  the  limit  of  a  convergent 

series  of  discrete  approximations  to  the  differential  gaune. 

(15) 

Gadzhiev  worked  out  necessary  and  sufficient  con¬ 
ditions  for  the  existence  of  a  pure  strategy  solution  for 
a  problem  with  quadratic  cost  function  and  linear  dynamic 
systems.  He  stated  also  that  a  pure  a  strategy  solution 
for  general  differential  game  might  not  exist.  In  exploring 
the  application  of  the  celebrated  Minimax  Principle,  he  met 
only  limited  success  because  of  the  difficulty  indefining  a 
probability  measure  for  the  controls  available  for  play 
which  are  time  functions  with  infinite  variability  in 
magnitude. 

The  most  rigorous  treatment  to  date  contain  in  the  work 
of  Freidman^^^^  and  Berkovitz .  Friedman  in  his  book 
published  in  1971  uses  Functional  Analysis  approach  and  went 
through  a  mathematical  maze  of  complications  to  obtain 
essentially  the  same  results  as  Isaacs.  Berkovitz  extended 
results  of  the  classical  calculus  of  variations  to  zero-sum- 
two-person  differential  games.  His  main  results  are:  under 
same  fairly  restrictive  conditions  with  the  Hamiltonian-like 
function 


H(x,u,v,£)  =  L(X/U,v)  +  £'*^.f(x,u. 


(1.5) 


the  optimal  control  u*  and  v  satisfy  the  following  equations 

.  *  * 

X  =  H^(x,u,v,£) 

*  * 

E  “  "  (1.6i 

““  “  “v  +  92u  “ 

^  0  A^gii  =0  >«^ig2i=  0 

where  gj^  and  g2  are  contraint  functions  on  u  and  v  respect¬ 
ively,  and  ^  are  associated  Lagrange  multipliers.  He 
also  establishes  sufficiency  conditions  using  field  concepts 
All  these  results  applies  under  the  assumption  of  existence 
of  a  saddle  point,  again  we  may  emphasize,  an  assumption 
that  is  not  always  true. 

As  mentioned  before  in  the  opening  statement,  analyti¬ 
cal  results  have  indeed  been  rare  except  for  the  problem 
with  linear  dynamics  and  quadratic  cost.  Athans presents 
a  review  of  recent  works  on  differential  games.  Ho,  Bryson 
and  Baron model  and  solve  a  persuit-evasion  problem  as  a 
linear  quadratic  game,  deriving  conditions  for  existence  of 

solution.  The  meagerness  in  analytical  results  according 
(8) 

to  Ho'  ^is  a  direct  consequence  of  the  complications  and 
the  complexities  introduced  into  the  optimal  control  problem 
by  the  "other"  controller. 


McFarland'  ^stated  that  most  authors  elect  to  treat 
each  player's  control  with  no  constraint  using  integral 
penalities  in  the  cost  function  to  preclude  any  solution 
with  infinite  magnitude.  Published  results  have  indeed 
been  rare  for  differential  games  with  bounded  control.  Pro¬ 
gress  were  made  by  Meschler^^^^  and  Mier^^^^  on  determinis¬ 
tic  problems  of  simple  construction  permitting  analytical 
treatment.  Mier  suggested  that  under  close  examination 
generalization  cannot  be  made.  Other  authors  have  made  some 
headways  in  this  respect  using  ntimerical  analysis.  These 
will  be  mentioned  in  the  next  chapter. 

Another  interesting  approach  to  differential  games  is 
the  so  called  geometric  approach.  Some  of  the  more  signi¬ 
ficant  contributions  in  this  respect  are  the  work  of  Blag- 

(21)  (22)  (23) 

uiere,  Gerard#  and  Leitman  '  '  in  an  augmented 

(24) 

state  space.  Karlin  and  Shapley  also  used  geometric 

approach  to  provoke  a  rigorous  investigation  into  the 

geometry  of  moment  spaces.  The  more  recent  works  an  geome- 

(25) 

trie  approach  to  game  theory  are  those  of  Westphal  -  and 

(26) 

Westphal  and  Stubberud  where  they  synthesize  mixed 

strategies  and  find  game  values  for  both  sclar  and  vector 

(27) 

controls.  Herrelko  later  extended  these  results  to 
cover  the  case  with  information  time  lag. 

Although  many  questions  still  remained  unanswered  for 
two-person  zero-sum  dynamic  games  with  perfect  information 
and  pure  strategies#  many  researchers  have  wandered  into 


the  area  o£  other  games.  One  reason  for  this  is  because 
the  early  works  were  not  applicable  to  many  real-world 
problems  which  are  often  n-person,  non-zero  sum  and 
stoachastic.  Each  of  these  areas  is  a  challenge  in  itself 
and  most  of  the  efforts  to  date  have  been  rightly  concen¬ 
trated  on  each  area  individually. 

Analytical  success  with  the  linear-quadratic  problem 

has  induced  many  authors  to  explore  stochastic  games.  Most 

of  the  works  in  this  area  have  been  on  two  person-zero  sum 

linear-quadratic  games  with  noisy  transitions  of  dynamics, 

random  initial  conditions,  or  moisy  observations.  According 

(27) 

to  Bryson  and  Ho  the  main  effort  has  been  to  relate 
solutions  of  these  problems  to  the  "certainty-equivalence" 
principle  of  stochastic  optimal  control  theory.  This,  how¬ 
ever,  contain  a  logical  fallacy  in  the  treatment  either 
implicitly  or  explicitly  of  one  player's  control  in  his 
opponent's  estimator.  Some  of  the  contributors  in  the  area 
of  stochastic  differential  games  are  Ho,  Speyer,  Behn  and 
Ho,  Rhodes  and  Luenberger,-  Wi liman,  Mons,  Bley,  etc. 

To  conclude  this  very  brief  overview  of  the  historical 
aspects  of  differential  geunes,  it  might  be  worthwhile  to 
mention  that  successful  researchers  have  shown  respect  for 
this  quite  new  field,  and  realize  that  the  complications 
involved  is  far  more  than  an  extension  of  optimal  control. 
Progress  is  made  in  careful  steps  and  examples  are  kept 
simple  so  that  the  new  concepts  being  uncovered  can  be  made 


clear. 


1.2  Differential  Game  Structure 

In  this  section,  an  informal  presentation  of  a  very 
general  type  of  differential  game,  where  there  are  any 
number  of  players  with  different  cost  criteria  and  different 
information  sets  will  be  given.  With  this  structure,  some 
general  classifications  of  differential  game  will  be  made. 
Figure  1  illustrates  basic  structures  of  a  general  differen¬ 
tial  game.  The  interval  of  play  is  assumed  to  be  [o,Tl 
where  T  may  be  a  fixed  final  time  or  the  time  when  the 
state  trajectory  first  reach  a  given  terminal  manifold. 

At  each  instantanous  time  t  in  the  interval  [o,T], 
each  player  from  the  total  number  of  N  players  chooses  a 
vector  of  control  inputs,  u^,  to  optimize  his  cost  criteria; 


J^(U;i^, .  . .  ,u^;t)  =F^(x(T),T)  +  L^(x,Uj^,...,Uj^;t)dt  (1.7) 


i  =  1,2, - -  N 


These  controls  serves  as  input  vectors  to  a  common  dynamic 
system (shared  by  all  players)  described  by  a  nonlinear  vector 
differential  equation: 

X  =  f(x,U2^,  . ,u^,t,w(t)) 

x(0)  “  Xq  (1-8) 

Where  w(t)  is  a  vector  input  of  random  noise  usually 
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Gaussian. 

Generally  some  contraints  £  Uj^  where 

=  [u^!  g(Uj^, . . .  ,v^ -t)  4  0]  (1.9) 

or  a  set  of  vector  contraint  equations  is  also  imposed  on 
the  choice  of  control  vectors  Uj^'s. 

At  each  particular  time  t,  each  player  also  has  a  set 
of  measurements  or  information  sets  available  to  aid  his 
decision  in  choosing  the  control  vector  These  information 
sets  are  accumulated  by  each  player  during  the  interval 
[0,t]  in  the  form 

y.  =  h,  (t^xC'T)  ,v.  (^))  for  all  1' €  [0,t]  (1.10) 

1  1  i 

where  u^('T')  is  the  noise  vector  input  to  each  of  the  player 
measurement  system. 

To  date,  most  differential  games  are  formulated  in  two 
special  cases  as  follow: 

(1)  h^(t,x(r)  ,v^(r)  =  x(^)  for  0  4^^4t 

where  we  have  a  deterministic  system  or  perfect  meas¬ 
urements  of  the  state  vector  if  all  information  is 
used,  the  solution  is  in  closed-loop  form. 


t  =  0 

t  >  0 


V  • 


(2) 


h^(t,x(r)  ,v^(r) 


for 

for 


then  only  the  Initial  state  vector  is  known  the  system 
is  still  deterministic,  but  in  this  case  the  solution 
can  only  be  generated  in  open- loop  form.  It  is  inter¬ 
esting  to  note  here  that  even  if  perfect  measurements 
are  available,  the  controller  may  still  not  be  able 
to  generate  a  closed- loop  solution  depending  on  the 
relative  sizes  of  the  computation  time  and  the  duration 
of  the  game.  There  are  only  a  few  simple  cases  namely 
with  linear  dynamics  and  quadratic  pay-offs  where 
closed- loop  solution' can  be  generated  in  closed-form. 
Other  more  difficult  cases  which  recently  have  drawn 
some  interests  from  researchers  are: 

(3)  h^(t,x(T)  ,v^('r) )  =  Hj^('r)x('r)  + 

for  0  4  "T  4  t 

V 

where  may  be  either  time-invariant  or  time-variant 

matrix.  Vj^(^)  is  additive  white  Gaussian  noise.  In 
this  case  we  have  stochastic  differential  game  with 
linear  measurements.  Again  only  linear-quadratic  dif¬ 
ferential  games  have  been  solved  with  this  information 
set. 
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for  t  =  0 
for  't  >  0 


(4)  h^(t,x(r)  ,Vj^(r) 


where  is  a  random- variable  usually  Gaussian.  Here, 
we  also  have  a  stochastic  game.  A  non-linear  version 
of  pursuit-evasion  differential  game  with  this  set  of 
information  will  be  presented  later-on  in  this  report. 


(5)  h^(t,x('r)  ,Vj^(r)  =  x(r)  for  0  4  't'  ^  t-  O' 


here  we  have  perfect  measurements  with  time  delay. 

Only  a  handful  of  papers  about  differential  games  were 
generated  on  this  set  of  information.  All  the  authors 
limited  themselves  to  simple  problems  in  this  case 
because  of  the  tremendous  complexities  involved. 

Once  the  measurement  is  made  and  one  of  the  information 
set  from  those  listed  above  is  formed,  a  control  law  can 
then  be  generated.  In  deterministic  cases,  control  laws 
are  generated  directly  either  from  analytical  or  numerical 
solutions.  In  linear  quadratic  stochastic  cases,  it  is  well 
known  that  the "Certainty  .Equivalence  Principle"  or  the 
"Separation  Theorem"  can  be  extended  from  the  theory  of 
stochastic  optimal  control.  That  is,  the  estimation  process 
can  be  carried  out  first  using  some  type  of  filter,  the  most 
well  known  being  Kalman  Bucy's,  then  the  estimated  states 
can  be  used  to  generate  a  control  law  as  if  they  were 
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deterministic . 


Furthur  classification  of  differential  game  can  be 

N 

made  from  the  cost  functions.  If  J  =  0  then  the 

1=1  i 

game  is  called  the  N  -person  Zero  sum  differential  game 
N  j 

If  Jj^  ^  0  then  it  is  called  N-person  non-zero  sum 

differential  game.  One  important  class  is  the  two-person 
zero-sum  differential  game.  This  is  the  class  of  differ¬ 
ential  game  that  we  shall  be  concerned  with  throughout  this 
dissertation.  An  exact  formulation  of  two-person  zero-sum 
differential  game  will  be  given  in  the  next  section. 

I 

1.3  Differential  Game  Formulation 

From  hereon  in  this  report,  we  shall  be  concerned  with 
two-person  zero-sum  differential  games,  whose  formulation 
is  given  as  follows: 

There  are  two  opposing  players  U  and  V  who  choose  their 
control  strategies  to  drive  the  equation  of  motion  of  the 
dynamic  system: 

d  x(t)  =  f[x(t),u<t),v(t),tl;  x(0)  =  Xq  (1.11) 

dt 

where 

x(t)  =  state  vector  of  dimension  n  x  1 
u{t)  =  control  input  of  the  minimizing  player, 
dimension  m  x  1 

v(t)  =  control  input  of  maximizing  player, 
dimension  p  x  1 


The  duration  of  play  [0,Tl  is  fixed,  the  terminal  time 
T  is  given  explicitly  in  the  problem.  The  vector  function 
f  is  assumed  piecewise  continuous  during  the  interval  of 
play,  and  differentiable  up  to  any  order  required  in  all  its 
argiiments . 

The  vector  control  functions  u(t)  and  v(t)  are  piecewise 
continuous,  differentiable  function  of  time,  and  belong  to 
some  prescribed  feasibility  regions  u(t)  e  U  and  v(t) e  V 
where 


U  =  [u: 

£^(x,u,t) 

0] 

(1.12) 

V  =  [v: 

0  (x,v,t) 

2  *”  “ 

< 

01 

(1.13) 

and  £2  vector  state  and  control  contraints, 
dimension  ^  m  and  p  respectively 

U  is  minimizing  and  V  is  maximiaing  the  following  scalar 
cost  functional: 

JT 

L(x(t),u(t),  v(t),t)dt  (1.14) 

0  " 

The  scalar  functions  F  and  L  are  also  continuous  and 
differentiable  up  to  any  order  required  in  their  argximents. 

The  feasible  regions  U  and  V  generally  preclude  the 
use  of  infinite  controls  by  either  player.  They  usueaiy  -im¬ 
pose  hard  limits  on  the  control  vectors.  Equation  (1.11) 
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implies  that  both  players  agree  on  the  dynamic  structure  of 
the  system.  In  reality  this  is  always  an  approximation. 

The  practicability  of  the  solution  then  depends  on  how  close 
one  can  model  (1.11)  to  represent  the  real  system.  This  is 
not  surprising,  however,  since  in  all  mathematical  modelling 
of  real  physical  problem,  some  simplified  assumptions  gen¬ 
erally  have  to  be  made  to  ensure  mathematical  and  computa¬ 
tional  tractability  of  the  model  of  the  problem. 

Note  that  generally  the  terminal  time  T  does  not  have 
to  be  given  explicitly.  Instead  a  terminal  manifold  of  the 
form  ^  (x(T),T)  =  0  can  be  described.  This,  however,  can¬ 

not  ensure  the  termination  of  the  game.  Therefore,  the  fixed 
duration  game  which  can  be  considered  as  a  special  case  of 
terminal  manifold  where ^(x(T) ,T)  =  T-t^  =  0  is  chosen  here 
to  eliminate  any  termination  problem  that  may  arises  in 
order  that  other  important  concepts  can  be  investigated  and 
clarified. 

It  is  also  Interesting  to  note  that  if  hard  limits  are 
not  imposed  on  the  controls  in  (1.12)  and  (1.13) ,  then 
additional  assumptions  will  have  to  be  made  on  the  controls 
to  ensure  that  the  magnitude  of  the  optimizing  controls  will 
be  finite.  This  is  done  in  McFarland .  Briefly,  he 
defines  regions  of  finite  controls  and  such  that  when¬ 
ever  U  try  to  use  infinite  control  in  minimizing  then  V  can 
choose  v(t) €  to  drive  the  cost  functional  very  large. 
Similarly,  whenever  V  try  to  use  infinite  control  in 
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maximizing,  then  U  can  choose  u(t)  ^  to  obtain  very  small 
value  for  the  cost  functional.  These  assumptions  are  not 
required  in  this  report. 


1.4,  Objective  and  Scope  of  Dissertation 

As  mentioned  before  on  the  opening  statements  of  this 

chapter,  an  efficient  algorithm  is  needed  before  a  solution 

of  pursuit  and  evasion  differential  game  can  be  implemented 

effectively.  Most  of  the  results  of  the  computational 

methods  developed  so  far  have  been  found  under  the  assump* 

tion  that  the  saddle  point  exists.  A  discussion  of  this 

point  and  a  counter  example  will  be  presented  in  the  next 

(18) 

chapter.  McFarland  worked  out  an  algorithm  without 
assuming  existence  of  a  saddle  point.  He  uses  Differential 
Dynamic  Programming  from  the  work  of  Jacobson  an  Mayne^^®^ 
for  inner  optimization  and  gradient  method  for  outer  opti¬ 
mization.  McFarland  results,  however,  does  not  contain  any 
hard  limit  on  any  control  or  state  constraint  as  in  (1.12) 
and  (1.13) .  Our  work  then  will  be  as  follow: 

1.4.1.  Using  an  approach  similar  to  McFarland's,  an 
algorithm  will  be  developed  to  handle  hard  limits  on  control 
and  state  variables  of  differential  games.  The  Differential 
Dynamic  Programming  used  in  the  inner  optimizations  will  be 
modified  to  handle  the  constraints.  Some  gradient  projection 
schemes  will  have  to  be  used  to  cope  with  the  outer  optimi¬ 
zation.  This  will  be  presented  in  chapter  2. 
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1.4.2  A  linear  quadratic  pursuit  and  evasion 
differential  game  will  be  investigated.  The  case  without 
any  control  constraint  will  be  solved  analytically.  Through 
a  simple  illustrative  example,  physical  outcomes  correspond¬ 
ing  to  parameters  of  the  problem  will  be  investigated.  The 
case  with  control  constraint  cannot  be  solved  analytically. 
Two  nxamerical  solutions  will  be  offered,  one  using  the 
algorithm  developed  in  1.4.1  and  another  using  an  indirect 
approach  with  the  assumption  of  existence  of  the  saddle 
point  and  direct  application  of  differential  Dynamic  Pro¬ 
gramming  similar  to  the  algorithm  used  by  Neeland^^^^ .and 
later  by 

Any  similarity  or  discrepancy,  advantage  and  disadvantage 
between  the  two  methods  will  be  reported.  Chapter  3  will 
cover  this. 

1.4.3  Chapter  4  will  cover  a  stochastic  nonlinear 
model  for  a  missle-anti  missle  intercept  problem.  A  mathe¬ 
matical  model  will  be  developed  using  a  set  of  sufficient 
statistics  as  state  variables.  The  problem  will  then  be 
solved  using  the  algorithm  developed  in  chapter  2. 

1.4.4  Chapter  5  will  summarized  all  the  results 
accumulated  in  this  report.  Recommendations  for  future 
research  will  be  presented. 
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CHAPTER  2 


DEVELOPMENT  OF  NUMERICAL  ALGORITHM 


It  is  widely  accepted  that  for  a  general  differential 
game,  a  numerical  solution  is  generally  needed.  In  such 
case,  therefore,  only  open-loops  form  of  solution  can  be 
generated.  However,  if  niunerical  algorithm  can  be  developed 
with  such  simplicity  that  the  total  computation  and  imple¬ 
mentation  time  is  much  less  than  the  duration  of  the  game, 
the  process  can  be  repeated  with  either  the  determined  or 
the  approximated  new  states  treated  as  initial  states 
depending  upon  whether  the  problem  is  deterministic  or 
stochastic.  In  this  manner,  a  closed-loop  solution  can  be 
approached  as  a  limit  as  the  computation  and  implementation 
times  get  smaller  and  smaller. 

On  the  other  hand,  one  must  be  careful  that  in  trying 
to  simplify  the  problem,  assumptions  are  not  made  that  per¬ 
tinent  physical  realizations  must  be  sacrified.  Thus  the 
control  engineer  must  strife  to  seek  the  delicate  balance 
between  these  two  points.  This  is  an  optimization  problem 
in  itself.  The  solutions  which  will  be  presented  in  this 
report  will  not  be  claimed  as  optimal  in  this  sense  but  they 
will  be  developed  with  these  two  points  in  mind. 

Before  we  actually  start  off  with  the  development  of 
the  numerical  algorithm,  it  would  seem  appropriate  to 
discuss  the  meaning  of  the  solution  of  differential  game 


to  get  a  clear  picture  of  what  we  are  looking  for.  The  state 
of  the  art  on  numerical  solution  can  then  be  surveyed  to 
pave  our  way  towards  the  solution.  The  actual  algorithm  will 
be  composed  of  two  parts:  the  inner  optimization  using 
Differential  Dynamic  Programming  with  state  and  control 
constraints,  and  the  outer  optimization  using  gradient  pro¬ 
jection.  Finally,  we  shall  conclude  this  chapter  with  the 
details  of  the  steps  of  the  algorithm. 


2.1  Differential  Game  Solution 


In  game  theory,  the  solution  for  each  player  is  the 
choice  of  the  strategy  that  he  has  to  choose  eunong  many 
possible  ones.  In  choosing  his  strategy,  a  player  cannot 
be  sure  about  the  outcome  of  the  game  because  he  does  not. 
have  any  "a  priori"  knowledge  about  his  opponent's  choice. 
This  is  the  fact  that  caused  more  complications  in  the 
theory  of  differential  games  than  just  being  simply  an  ex¬ 
tension  of  optimal  control  theory. 

In  two  person  zero-sum  differential  game,  the  players 
are  two  adversaries  confronting  one  another  with  one's  loss 
being  the  other's  gain.  Therefore,  each  player  will  try  to 
minimize  the  maximal  loss  his  opponent  can  caused.  The 
strategy  that  realizes  this  outcome  becomes  his  solution. 
Once  the  solution  is  found,  the  player  does  not  care  what 
strategy  his  opponent  will  use.  He  is  that  much  better  off 
if  his  opponent  does  not  use  the  strategy  that  caused  him 
the  maximal  loss,  it  is  for  this  reason  that  some  authors 
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have  called  this  type  of  solution  the  security  level  of  the 
game. 

In  the  following  discussion,  let  us  designate  the  two 
opponents  as  follows: 

minimizing  player  =  U,  using  control  =  u(t) 

maximizing  player  =  V,  using  control  =  v(t) 

t  is  any  instant  between  the  interval  [0,T] 

the  cost  functional  of  the  game,  J(u,v)  is  in  the  form 

of  equation  (1.14) 

First,  let  us  look  at  the  minimizing  player's,  U, 
point  of  view.  For  any  arbitrary  control  u(t)  that  he 
chooses,  he  is  assured  that  at  maximal  the  cost  will  be 

J(u,v)  =  max  J(u,v)  (2.1) 

v(t) 

Naturally,  U  will  choose  the  control  u(t)  which  will  mini¬ 
mize  the  maximum  cost 

A  A 

J(u,v)  =  min  (max  J(u,v)l  (2.2) 

u(t)  Ht) 

Thus,  the  solution  for  U  is  the  so  called  minnimax  solution 

A  A 

u(t) .  Note  that  U  does  not  care  whether  V  will  use  v(t)  or 
not  because  from  equation  (2.2)  we  can  see  that 

A  A  A  ^ 

J(u,v)  ^  J(u,v)  for  V  ^  V 


(2.3) 


The  equality  in  equation  (2.3)  is  included  because  of  the 

A 

possibility  of  the  non-uniqueness  of  v.  Therefore,  U  will 

A 

usually  gain  if  V  uses  any  other  control  besides  v. 

Now,  from  V's  point  of  view,  for  any  arbitary  control 

v(t) 


J(u,v)  =  min  J(u,v)  (2.4) 

u(t) 

V  is  assured  that  his  cost  will  be  at  least  J (u, v) .  Since 
his  objective  is  to  maximize,  V  will  choose  ^(t)  that  will 
maximize  this  minimum  cost 

J(u,^)  =  max  (min  J(u,v)]  (2.5) 

v(t)  u(t) 

Thus,  V  is  the  max  min  solution  for  V.  Again  V  does  not 
care  if  U  uses  u  or  not.  From  (2.5)  it  is  clear  that  V 
will  almost  always  gain  and  at  least  will  not  loose  if  U 
uses  any  other  control  than  u  because 

^  J(u,v)  for  u  u  (2.6) 

A 

The  net  cost  of  the  game  is  J(u,v).  Generally,  we  can  state 
that  each  player  will  usually  benefit  from  using  the  secure 
strategies  as  depicted  in  the  following  equation 

J(u,v)  ^  J(u,y)  4  J(u,y) 


(2.7) 


From  a  general  viewpoint,  there  is  absolutely  no  reason  to 
presume  that 

u  =  u  and  v  =  v  (2.8) 

A  /w 

If  (2.8)  is  true,  however,  u  and  v  will  constitue  a  saddle 

A 

point  for  the  game, and  the  net  cost,  J(u,v),  is  called  the 
value  of  the  game. 

Definition  1:  The  differential  game  described  in  section 
1.3  is  said  to  have  a  value  if 

Min  [Max  J(u,v)]  =  Max  [Min  J(u,v)]  (2.9) 

u{t)  v(t)  v(t)  u(t) 

where  u  ranges  over  U  and  v  ranges  over  V. 

Definition  2;  If  a  game  has  the  value  J*,  and  if  there 
exists  (u*v*)  such  that  J*  =  J(u*v*)  and 

J(uTv)  ^  J(u*,v*)  <  J(u,v*)  (2.10) 

*  * 

then  u  is  optimal  for  U  and  v  is  optimal  for  V. 

*  *  *  * 

The  pair  (u  ,v  )  is  called  a  saddle  point,  u  and  v  are 

called  pure  strategy  solution. 

Most  of  the  previous  works  on  differential  game  have 
been  concentrated  on  pure  strategy  solution,  and  the  condi¬ 
tions  for  which  it  exists.  However,  for  a  general  nonlinear 
nonguadratic  problem  a  saddle  point  does  not  generally  exist 


Two  examples  will  be  shown  here  to  demonstrate  this  point. 
The  first  one  is  dued  to  McFarland concerning  a  single 
stage  static  game.  The  second  one  is  dued  to  Berkovitz 
concerning  differential  game  with  nonlinear  dynamic. 

Example  1;  Let  the  controls  be  scalars  u  and  v  and  the  cost 
be  the  polynomial  function  of  u  and  v  as  follows : 

J(u,v)  =  (u^-2u2+2) (-v^  +  v2  +  2)  +  {u3-3u) (v^-2v)  (2.11) 

18  18  I  18  9 

The  cost  function  is  formed  in  such  a  way  that  neither  U  nor 
V  can  use  infinite  control  in  their  optimization  process. 
Using  previous  terminology  c'^  =  [v:  |v.|<2]  and  c'^  =[u:u€R] 
where  R  is  any  number  on  the  real  line.  The  solutions  to 
this  problem  are; 

A  A 

For  player  U,  Minmax:  u  =  +  1,  (v  =  +  1) 

A  A 

J(u,v)  =  1 

For  player  V,  Maxmin;  v  =  0,  (u  =  +1) 


J  (u,v) 


2 

I 


A 

Net  cost  of  the  game;  J(u,v)  =  2 

3 


It  is  interesting  to  note  that  McFarland  called  the  points 


(+1,+1)  and  (-1,-1)  local  saddle  points.  These  points, 
however,  are  not  saddle  points  according  to  the  definition 
given  above.  Using  definition  2,  it  is  obvious  that  the 
saddle  point  does  not  exist  in  this  problem.  This  differ¬ 
ence  occurred  because  McFarland  defined  a  saddle  point  as 
the  point  where  the  gradients  of  the  cost  with  respect  to 
controls  simultaneously  vanish  accompanied  by  some  simple 
second  order  condition.  In  this  report,  the  name  saddle 
point  will  be  preserved  for  such  point  when  the  control 
pairs  of  the  minmax  and  the  maxmin  solutions  are  the  same 
and  pure  strategy  exists.  McFarland  also  worked  out  the 
solutions  of  this  example  in  details  which  will  not  be  re¬ 
peated  here. 

Example  2;  For  a  game  of  fixed  final  time  T  >  0,  play  ter 
minates  at  t  ■  T.  The  cost  function  being  minimized  by  U 
and  maximized  by  V  is  given  by 


J(u,v) 


(2.1 


The  state  x  is  determined  by  the  dynamic  equation  and  the 
initial  condition 


=  (V 


-  u)^. 


x(0) 


X 

o 


(2.1 


The  controls  are  constrained  by  u  =  U(t,x),  where  U  is 
piecewise  continuous  differentiable  on  the  interval  [0,T] 


and  0  4  U(t,x)  4  1,  and  v  =  V(t,x),  where  V  is  also  piece' 

wise  continuous  differentiable  on  the  interval  [0,T]  and 
0  4  V(t,x)  4  1. 

Maxmin  Solution; 

For  any  arbitrary  control  chosen  by  V,  U  can  choose  the  same 

strategy  and  thus  guarantee  that 

X  =  0  on  the  whole  interval  [0/T] 

Therefore,  max  min  J(u,v)  4  x  T 
V  u  o 

For  any  pair  (u,v) ,  however,  it  is  obvious  that  x  >  0. 

Thus , 


J( 


u,v)  =  xdt  ^  X  T 

Jo  o 


Therefore,  max  min 
V  u 


J(u,v)  - 


X  T 
o 


(2.14) 


Minmax  Solution: 


For  any  arbitrary  control  chosen  by  U,  V  can  practically 
guarantee  that  x  1/4  on  [0,T]  by  choosing  his  strategy  as 
follows : 


■  ■{: 


if 

if 


u  4  1/2 
u  >  1/2 


using  this  strategy  V  can  make  x(t)  >  x  +  t 

2  °  4 

Hence,  min  max  J(u,v)  x  T  +  T 
u  V  ® 

Now,  if  U  choose  u  s  1/2  tI0»T],  then  for  any  v. 


4  1/4  on  I0,T]. 


(2.15) 


Thus ,  upon  integrating  we  have 


min  max  J(u,v)  4  x  T  +  (2.16) 

u  V  Of 

From  (2.15)  and (2. 16)  we  can  conclude  that 

2 

min  max  J(u,v)  =  x  T  +  T  (2.17) 

u  V  ° 

In  summary,  we  have  J (u,v)  =  XqT 

A  A  2 

J(u,v)  =  x^T  +  T 
°  IS 

Therefore,  for  this  problem 

^  ^  A  A 

J(u,v)  <  ,  J(u,v) 

Again,  a  saddle  point  does  not  exist  in  this  game,  and  the 
game  does  not  have  a  value  in  pure  strategies. 

From  these  examples,  we  see  that  even  for  simple  gaunes 
(the  first  is  a  static  game  and  the  second  even  though  has 
nonlinear  dynamic  contain  linear  cost)  saddle  point  does  not 
have  to  exist.  Sufficiency  conditions  for  saddle  points 
were  worked  out  by  many  authors,  but  they  are  restricted  to 
a  very  limited  class  of  differential  gaune. 

One  question  arises  on  what  then  is  the  true  solution 
of  differential  game  in  the  case  where  a  saddle  point  does 
not  exist.  The  celebrated  Minimax  Principle  of  game 
theory  asserts  that  in  this  case  the  players  can  find  fixed 
probability  laws  from  which  random  strategy  (among  those 


possible)  can  be  selected  in  such  a  way  that  the  average 
value  of  the  cost  sustained  by  each  player  comprises  the 
value  of  the  game  in  the  long  run.  The  probability  laws 
that  have  to  be  found  is  contained  in  the  mixed-strategy 
solution.  The  main  disadvantage  for  this  type  of  solution 
is  that  it  is  not  only  hard  to  implement  but  also  exceedin^y 
complicated  if  not  impossible  to  solve  for  in  a  realistic 
pursuit-evasion  differential  game.  All  the  researchers 
who  worked  on  mixed-strategy  had  to  resort  to  very  simple 
problems  which  bear  little  or  no  physical  significance. 

At  the  present,  we  should  be  contented  with  the 
security  level  type  of  solution.  The  minmax  and  the  maxmin 
solutions  provide  the  least-maximxam-loss  strategy  for  each 
player.  If  the  player  can  accept  the  cost  accrued  from 
using  his.  least-maximum- loss  strategy,  then  he  can  rest 
assured  that  he  will  not  be  worse  off  no  matter  what  strate¬ 
gy  his  opponent  will  use.  One  critical  argument  against 
this  type  of  solution  is  that  it  is  too  conservative. 

However,  in  view  of  the  fact  that  numerical  solutions  are 
needed  for  all  realistic  pursuit-evasion  differential  games, 
all  the  strategies  implemented  will  be  suboptimal  to  some 
extent.  The  less  complicated  the  solution  can  be  the  closer 
it  will  be  to  a  true  optimal.  In  addition  to  the  computation 
and  the  implementation  time  involved,  this  should  more  than 
outweighed  any  advantage  that  could  be  gained  by  using  the 
mixed- strategy  solution.  This  report  then  will  be  aimed  at 
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finding  an  efficient  algorithm  to  solve  for  the  least-maxi- 
mum-loss  strategies  or  the  mlnmax  and  the  maxnim  solutions 
without  the  assumption  of  existence  of  a  saddle-point. 

2.2  The  State  of  the  Art  on  Numerical  Solution 

As  mentioned  before  in  section  1.1,  Berkoviti^^^used 

calculus  of  variation  approach  to  form  a  set  of  equations 

(22) 

for  two  person  zero-sum  differential  game.  Blaquiere 
also  has  a  similar  development  but  the  emphasis  there  is 
put  in  the  geometric  aspect  of  the  game.  These  works  have 
become  the  basic  fundamentals  for  most  numerical  methods 
developed  thereafter.  Most  authors  require  the  assumption 
of  the  existence  of  a  saddle-point  to  provide  the  pure- 
strategy  solution.  The  main  feature  for  these  techniques 
is  having  to  solve  a  two-point  boundary  value  problem  (TPB 
VP) .  This  type  of  problem  is  encountered  very  frequently 
in  optimal  control  theory  and  in  mathematics,  they  involved 
a  set  of  differential  equations  with  initial  conditions 
given  on  some  variables  and  final  conditions  given  on  the 
rest.  Since  the  optimization  process  using  this  approach 
does  not  involve  evaluation  of  the  cost  function  directly 
in  each  iteration,  it  has  been  labeled  the  indirect  methods. 
Bryson  and  suggested  that  numerical  methods  for  TPB 

VP  can  be  cagegorized  into  three  methods;  gradients, 
quasilinearization  ( Newton -Raphson) ,  and  neighbouring  ex¬ 
tremal. 

Recently,  Jacobson  and  Mayne^^®^  has  added  a  very 
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efficient  new  technique  to  solve  the  optimal  control  problem 
using  Differential  Dynamic  Programming  (DDP) .  This  method 
differs  to  the  above  indirect  methods  in  that  rather  than 
having  to  solve  the  TPBVP,  a  set  of  associated  equations 
are  derived  with  all  the  final  values  given.  The  task  of 
integrating  backward  is  much  simpler  than  the  task  of  solv¬ 
ing  the  TPBVP.  Moreover,  the  convergence  time  of  DDP  is 
generally  found  to  be  more  rapid  than  any  of  the  three 
methods  mentioned  above. 

All  three  indirect  methods  mentioned  have  some  common 
features.  Each  method  start  off  with  some  nominal  solution 
for  which  some  boundary  conditions  are  satisfied,  and  each 
use  informations  provided  by  a  linearized  solution  about 
the  nominal  solution  to  improve  the  solution  for  successive 
iteration  until  all  the  boundary  conditions  are  satisfied. 

The  rate  of  convergence  differs  greatly  as  they  are  applied 
to  various  problems.  Generally,  the  gradient  method  exhibits 
a  fast  convergence  to  start  off  but  becomes  relatively  poor 
near  the  optimal.  Some  phenomenal  such  as  zig-zagging  has 
been  known  to  be  closely  associated  with  this  method  near 
the  optimal  value.  Newton-Raphson  or  quasilinearization 
converges  quadratically  near  optimal  but  the  initial  guess 
must  be  chosen  very  carefully.  To  this  end  neighbouring 
extremal  is  generally  even  more  sensitive  to  the  initial 
guess. 

All  gradients  methods  exhibit  one  common  difficulty 
namely  the  so  called  "step-size"  problem.  That  is,  after 


a  feasible  direction  is  found  using  the  gradient,  how  far 
should  the  control  correction  be  applied  i/i  that  direction. 
Too  small  a  step-size  will  cause  a  drastic  decrease  in 
convergence  rate  whereas  too  big  a  step-size  sometimes  leads 
to  non-convergence.  There  are  two  basic  techniques  to  take 
care  of  this  problem.  The  first  one  was  devised  by  Jacobson 
an  Mayne^^^^  and  used  effectively  by  McFarland in  his 
dissertation.  This  has  to  do  with  adjusting  the  time  inter¬ 
val  ['?',T]  on  which  the  new  control  is  found  in  such  a  way 
that  the  variation  of  the  states  is  not  too  large.  The 
second  technique  was  introduced  by  Jarmark ^ 
where  the  quadratic  terms  of  the  controls  are  added  to  the 
integral  terms  of  the  cost  functional  and  the  weighting 
matrices  are  chosen  in  such  a  way  that  again  the  variation 
in  the  states  is  acceptable.  Both  techniques  have  exhibited 
very  good  convergence  property. 

Leffler^^®^  developed  theortically  a  nvimerical  algorithm 
containing  two  phases.  The  first  is  called  the  "gradient" 
phase  in  which  the  directions  of  the  control  changes  are 
computed,  and  the  second  is  called  the  "restoration"  phase 
which  is  needed  to  keep  the  new  control  within  the  feasible 
region.  Theoretically,  Leffler's  algorithm  is  capable  of 
handling  constraints  on  both  states  and  controls.  Computa¬ 
tionally,  however,  the  pursuit-evasion  problem  that  he  solved 
does  not  include  any  significant  constraint  on  either  the 
states  or  control  inputs  of  each  players. 


On  the  application  aspects,  Robert  and  Montgomery ^ 
attempted  to  solve  a  classical  pursuit-evasion  problem.  The 

distance  between  the  two  aircrafts  at  the  time  of  closest 
approach  was  used  as  the  cost  index.  They  were  successful 
in  obtaining  the  optimal  trajectories  for  most  initial  con¬ 
ditions.  They  also  found  some  regions  where  the  trajectories 
are  not  unique  and  remedied  the  situation  by  adding  the  time 
until  interception  term  into  the  cost  function.  The  dynamics 
were  non-linear  and  the  controls  were  subjected  to  hard  con¬ 
straints.  The  computation  time  required,  however,  was  very 
large.  Approximately  the  computation  time  was  ten  times  as 
great  as  the  engagement  time  in  their  report. 

The  most  complex  air-to-air  combat  model  so  far  docu¬ 
mented  was  worked  out  by  Lynch ^3®).  The  emphasis  was  to  form¬ 
ulate  and  solve  as  realistic  a  mathematical  model  of  air-to- 
air  combat  as  possible.  A  three  dimensional  model  was  used. 
All  the  involved  factors  were  considered.  Thrust,  drag,  and 
lift  were  stored  as  a  monlinear  function  of  altitude  and 
airspeed.  The  controls  were  roll-angle,  thrust,  and  turn 
rate  with  the  latter  two  subjected  to  hard  constraints.  The 
cost  index  used  is  the  time  required  for  the  pussuer  to 
manuver  closer  to  the  evader  than  some  given  radius.  Again 
Lynch  used  the  gradient  method  with  the  same  step-size 
adjustments  as  Robert  and  Montgomery  to  obtain  satisfactory 
convergence  for  most  initial  conditions.  He  also  reported 
on  singular  surfaces  where  non-unique  solutions  were 
encountered.  Needless  to  say  the  computional  time  needed 
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were  horrendous.  Roughly  the  requirement  for  the  computa¬ 
tional  time  is  one  hundred  times  in  magnitude  as  compared 
to  the  simulated  encounter  time. 

Leatham^^^^  also  studied  the  same  model  mentioned  above 
using  the  method  of  "neighbouring  optimal  trajectories". 

This  method  is  closely  associated  to  the  "successive  sweep" 
method  of  Dyer  and  McReynolds. The  details  will  not  be 
described  here,  interest  readers  can  refer  to  the  above 
references.  It  might  be  worthwhile  to  mention,  however, 
that  even  if  the  missing  initial  conditions  can  be  accurately 
guessed,  the  computational  time  for  this  method  took  roughly 

twenty  times  greater  than  the  time  of  engagement.  Dethlef- 
(41) 

sen  performed  an  analytical  analysis  of  neighbouring 

optimal  method  for  a  much  simpler  problem.  No  numerical 

result,  however,  was  included  in  that  report. 

(42) 

Graham'  entended  the  quasilinear ration  technique  of 
optimal  control  to  cover  the  first  -order  necessary  conditions 
he  derived  for  differential  game.  The  technique  was  then 
used  to  solve  a  pursuit-evasion  game  involving  a  ground-to- 
air  interceptor  and  a  supersonic  airplane  essentially  the 
same  unconstrained  problem  solved  by  Leffler.  This  method 
is  very  sensitive  to  the  choice  of  the  initial  trajectory. 

The  magnitude  of  the  computational  time  is  approximately  ten 
times  that  of  the  encountered  time. 

Neeland^^^^  used  Differential  Dyneunic  Programming  (DDP) 
to  develop  algorithm  to  solve  a  realistic  air-to-air  combat 


game  under  the  assumption  that  the  saddle  point  exists. 

Even  though  the  development  of  the  algorithm  contains  the 
term  up  to  second-order.  The  actual  algorithm  used  to  solve 
the  pursuit-evasion  problem  is  actually  a  first  order 
algorithm  which  is  practically  required  because  he  was  look¬ 
ing  for  a  very  fast  computation  time.  He  actually  reduced 
the  computational  time  to  be  smaller  than  the  engagement 
time  in  the  non-sigular  case.  Jarmark  also  confirmed  this 
for  a  large  number  of  sample  problems.  Therefore,  we  must 
conclude  that  of  all  the  techniques  available  so  far.  Dif¬ 
ferential  Dynamic  Programming  is  the  most  efficient  one. 
Details  development  of  the  first  order  algorithm  of  DDP  will 
be  included  in  the  next  section. 

All  the  methods  discussed  so  far  have  been  under  the 
assumption  of  existence  of  a  saddle  point.  Reports  with  no 
a-priori  assvimption  of  a  saddle  point  have  been  rare  indeed. 
McFarland  worked  out  one  such  report.  Besides  having  no 
assumption  of  a  saddle  point,  his  technique  differs  from 
the  indirect  method  in  that  the  evaluation  of  the  cost 
function  is  required  in  each  iteration.  Therefore,  McFar¬ 
land's  technique  is  sometime  referred  to  as  a  direct  method 
or  a  direct  solution  technique.  Briefly  for  an  arbitrary 
control  the  inner  optimization  of  this  method  is  performed 
by  using  second  order  DDP  to  locate  all  the  local  maxima 
(minima)  created  by  the  opponent's  control.  The  player's 
control  is  then  adjusted  by  using  either  the  "steepest  decent 


or  the  "conjucata  gradient”  methods.  This  adjustment  is 
called  the  outer-optimization.  The  process  is  then 
reiterated.  Since  McFarland  does  not  consider  any  control 
or  state  constraint,  the  termination  criteria  occurs  when 
the  variation  of  the  Hamiltonian  with  respect  to  the  player's 
control  is  negligible  in  which  case  a  saddle  point  is  located 
or  when  a  cross-over  point  is  located  in  which  case  the 
solution  will  not  be  a  saddle-point.  The  exact  definition 
of  a  cross-over  point  will  be  given  in  the  following  section. 

2 . 3  Differential  Dynamic  Programming  with  State  and  Control 
Constraints 

V7e  shall  start  our  development  with  the  inner  optimiza¬ 
tion  process.  Even  though  the  actual  derivation  is  for  a 
maximin  solution,  it  can  also  be  applied  to  a  minmax  solution 
simply  by  the  substitution  of  control  variables  and  the  inter¬ 
change  between  the  minimization  and  the  maximization  within 
the  procedure. 

2.3.1.  Derivation  of  DPP  with  State  and  Control  Con¬ 
straints 

For  a  maxmin  solution,  with  any  arbitrary  control  v{t) 
chosen  by  V,  the  differential  game  formulated  in  section 

1.3  becomes  a  constrained  optimal  control  problem  as  follows: 

Player  U  now  chooses  his  control  strategy  to  drive  the 


equation  of  motion  of  the  dynamic  system: 


(2.18) 


d  x(t)  =  f  (x(t)  ,u(t)  ,t) ;  x(0)  =  x^ 

Zt~  -  -  -  -  o 


t  €  [0,T]  =  8,  T  is  fixed 

x(t)  €  X  an  n  dimensional  Euclidean  space 

u(t)  £  U 

U  =  [u:  0-p.r"^,  2(x,u,t)  4  0)  (2.19) 

where  R™  is  an  m  dimensional  Euclidean  space /  the 
mapping  is  bounded  by  the  constraints  vector  hyperplane 
£{x,u,t)4  0  of  dimension  4  m 

U  is  trying  to  minimize  the  following  cost  function 


J(u(t)) 


L(x,u,t)dt  +  F(x(T)  ,T) 


(2.20) 


Since  the  starting  time  is  arbitrary,  we  can  rewrite 
this  cost  function  using  the  imbedding  principle 


j(x(t) ,t,u('r) )  =  F(x(T),T)  +  L(x(r),u, (T)  ,r)dr  (2.21) 


We  then  define 


J  (x(t),t)  =  min  J(x(t)  ,t,u(r) ) 
u{r) 


=  min  (F(x, (T),T)  + 
u{r) 


J 


^  L(x(T)  ,u(r)  ,r)dr  ]  (2.22) 


By  using  the  well  known  principle  of  optimality  in  optimal 


control  theory  (2.22)  can  be  rewritten  as  follows: 

*  pt+At  fji 

J  (x(t),t)  *  min  [F(x(T),T)  +  L  dV  +  L  dT]  (2.23) 

u  (^)  J  t  J  t+At 

the  arguments  of  the  functionals  L  in  (2.23)  is  the  same 
as  those  in  equation  (2.22).  Combining  the  first  and  last 
term  in  the  bracket  and  using  the  above  definition  we  have 

*  *  rt+  At 

J  (x(t),t)  =  min  [J  (x(t+  At)  ,t+ At)+  L  drl  (2.24) 

u(r)  “  Jt 

Expand  J*(x(t+  At),t+  At)  in  Taylor's  series  about  (x(t),t) 
for  small  At, 

0=  min  [  ^  j!  At  +  ^  J*.  f  (x(t)  ,u(t)  ,t)  .  At+L(x(t)  ,u(t)  ,t) 
u(t)  At  ^x  ” 

.  At  +  o  (  At)  1  (2.25) 

where  o  (At)  — ^  0  as  t  0 
A  t 

Dividing  (2.25)  throughout  by  At  and  let  At  approaches  zero, 
we  have 

T 

^  J*+  min  [L(x(t)  ,u(t)  ,t)  +  ^  J*.  f  (x(t)  ,u(t)  ,t)  1  =  0  (2.26) 

b  t  u(t)  ^  X 

the  partial  derivatives  are  evaluated  at  the  point  (x(t),t) 
This  is  the  well  known  Bellman's  equation  in  the  optimal 
control  theory  and  serves  as  a  starting  point  for  DDP.  Define 


the  Hamiltonian  as 


*  T 

H(x,u,  J  ,t)  =  L(x,u,t)  +  J*  .  f(x,u,t)  (2.27! 

““  X  ““  X  **  ^ 

where  J*  s  ^  J*  =  J*  ^  J* _  i_J  (2.27! 

X  ^  X  X2  a  Xn 

the  superscript  T  stands  for  transpose.  (2.26)  then  becomes 


J*(x*(t),t)  +  min  H(x*  (t)  ,u(t)  /  J*,t)  =  0 

^  "  u(t)  "  “  X 


(2.28! 


Since  we  are  dealing  with  the  state  and  control  constraints, 
a  penalty  term  can  be  incorporated  into  the  cost  functional 
as  follows 


J(x(t)  ,u(t)  ,t)  =F(x(T),T)  + 


(L(x,u,t) 


T 

■¥  jU»g  (x,u,t)  dt 
(2.29! 


With  constraints  then  Bellman  equation  becomes 


J^(xT (t) ,t) 


+  min 
u€U 


♦  T 

(H(x*(t)  ,u(t) ,  Jjj^t)  +  ^  .2(x(t)  ,u(t)] 

(2.30 


where  is  the  Lagrange  multiplier  vector  and  can  be  solved 
for  by  using  the  Khun-Tucker  condition  from  non-linear 
programming.  This  is  included  in  the  appendix  A. 

For  now,  it  suffices  to  say  that,  the  vector  multiplier 
function,  X^(x,\i,t)  is  identically  equal  to  zero  when  the 
corresponding  constraint  is  a  strict  inequality.  Otherwise. 


where  designates  the  boundary  in  U  or  when  the  constraint 
is  a  strict  equality. 

It  is  a  well  known  fact  that  except  for  a  few  simple 
cases.  Bellman  equation  cannot  be  solved  analytically.  DDP 
provides  an  excellent  iteration  procedure  for  numerical 
solutions.  We  now  proceed  to  derive  pertinent  equations 
required  for  DDP. 

Considered  equation  (2.30),  if  there  exists  a  nonimal 
_  *  _ 

X  such  that  x  (t)  =  x(t)  +  Ax{t)  where  Ax(t)  is  small  in  the 
interval  [0,T]  then 

*  _  * 

J  (X  +  AX,t)  +  min  [H(x  +  Ax,u,J  ,t) 
t  u€U  - 

T  — 

(x+AX,u,t)  .  2(x+Ax,u,t)l  =  0  (2.32) 

Expand  (2.32)  in  Taylor's  series  to  first  order  in  Ax  about 
Z  and  using  the  so  called  complimentary  slackness  in  the 
Khun-Tucker  condition  we  get 

^  _  J  (x,t)  +  ^  J  (x,t)  .AX  +  min  tH(x,u,J*  ,t) 

at  at  —  u«u  * 

T  *  *  _  ^ 

+  H  (x,u,J„  ,t).Ax  +  (J  .  f(x,u,t))  .AX 

^  ~ 

4-  >tt*^(x^u^t)  .g^(x,u,t)  .  Ax  +  h.o.t.  =  0  (2.33) 

h.o.t.  stands  for  higher  order  terms  in  the  Taylor's  series 
expansion. 
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If  AX  is  very  small,  as  ax  approaches  zero,  the  higher 
order  terms  also  tend  towards  zero,  and  we  can  split  (2.33) 


into  two  parts  as  follows 


*  *  * 

^  J  (x,t)  +  H(X,U  ,J  ,t)  =  0 

dt  X 


(2.34a) 


*_  _*  *  _* 

J  (x,t)  +  H  (x,u  ,J„/t)  +  J  .  f(x,u,t) 

dt  X  _  XX 


T  _  *  _ 

+  g  (x,u,t)  .^(x,u,t.)  =  0 

X 


(2.34b) 


*  *  * 

where  u  =  min  H(x,u,J  ,T)  or  in  words  u  is  the  feasible 
u€U 

control  with  which  the  Hamiltonian  is  minimized. 

*  * 

Since  J  (x(t),t)  and  J  (x, (t) ,t)  are  functions  of  X(t) 
and  t,  the  total  derivatives  with  respect  to  t  are 


* 

d 

J  (x,t)  =  ^ 

J*(x,t)  +  J  ( 

3t 

5  ”  X 

* 

*  * 

d 

J  (x,t)  = 

J  (x,t)  +  J 

dt 

X  dt 

X  XX 

(2.35a) 


(2.35b) 


We  now  define  an  estimates  cost  change  at  time  t  as 


a(t)  =  J  (x,t)  -  J(x,t) 


(2.36) 


where  J(x,t)  is  the  nominal  cost  that  occured  when  U  is  using 
the  control  strategy  u(t) .  Note  that 

d  J(x,t)  =  “L(x,u,t)  (2.37) 


Substituting  (2.35a)  and  (2.37)  into  the  total  time  deriva¬ 
tive  of  definition  (2.36)  we  have 


(t)  =  (X,t)  +  J  (X/t)  .f  (x,u,t)  +  L(x,u,t) 

^t 


(2.38) 


Notice  that  the  last  two  terms  on  the  right  hand  side  of 

(2.38)  is  the  definition  of  a  Hamiltonian.  Using  (2.34a) 
*  _ 

for  ^  J  (x,t)  then  (2.38)  becomes 

at 


-a(t)  =  H(x,u*J*  t)  -  H(x,u,J*  t) 

X,  X  f 


(2.39) 


Substituting  (2.34b)  into  the  negative  of  (2.35b)  we  obtain 


-J*(x,t)  =  H^(x,u*J^,t)  + 


+J*  .  [f(x,u*t)  -  f(x,u,t)] 

XX 


(2.40) 


Consider  the  last  term  on  equation  (2.40) 


*  _  if  ^  ^  ^  — 

J  .  (f(x,u,t)  -  f(x,U,t)J  »  J  .f  (x,u,t)  .  AU 
xx“-”  -  ^-U-- 


(2.41) 


Expand  the  dynamic  equation  (2.18)  about  x,u  to  the  first 
order,  we  get  a  differential  equation  describing  an  approxi¬ 
mation  of  the  deviation  in  x  as  follows 


d  Ax  «  f  (x,u,t).Ax  f  (x,u,t)  .  AU 

—  —  _35  —  —  —  — ^  —  —  — 

dt  “  - 


(2.42) 
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X  (0)  =  0 


(2.42) 


Throughout  the  derivation  process,  we  are  depending  on  the 
assumption  that  Ax  small.  Therefore,  (2.42)  can  be 
used  to  show  that  will  also  be  small  for  small 

A  x«  This  suggests  that  the  term  (2.41)  can  be  neglected. 

A 

Neglecting  the  J  terras  in  (2.40)  introduces  an  error 

XX 


*  * 

A  J  (t)  in  J  (t)  of  order 

^  X 


J  IH*  (^i)  -  H(tj^)  I  dt 


(2.43) 


The  integration  is  performed  backward  because  the  final 
conditions  of  differ£.ntial  equations  (2.40)  and  (2.41)  are 
given  as  we  shall  see  later.  We  define 


The  error  Aa(t)  in  a(t)  is  of  order 


rt  *  * 

p  ju  (tj.)  -  u(ti)|  dtjL|u  (t^)  -  U(t2)j  dt^  (2.44) 

From  equation  (2.39)  it  is  clear  that  a(t)  is  of  order 


It  is* <‘3’  -  S«=3'l  ■*'=3 


(2.45) 


From  (2.44)  and  (2.45),  we  can  see  that  if  either  T  -  t  is  of 
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★  —  i 

order  £.  or  |u  -  u|  is  of  order  E  ,  then  the  error  Aa(t) 

2 

is  of  order  £  while  a(t)  itself  is  of  order  £  .  Thus, 

t 

we  have  shown  that  by  neglecting  the  term  (2.41),  a(t)  will 
still  be  correct  to  the  first  order. 

At  t  =  T;  J*(x(T),T)  =F(x,(T),T)  (2.46) 

hence  we  have  the  following  final  conditions 

a(T)  =  0  (2.47) 

* 

J  (X(T) ,T)  =  F  (x(T),T)  (2.48) 

In  summary,  the  equations  that  constitue  the  heart  of 
DDP  in  the  case  where  the  state  and  the  control  constraints 
are  present  are 

-a(t)  =  H(x,u?J^,t)  -  H(x,u,Jjj,t) ;  a(T)  =  0 

*  T  *  * 

-J  (x,t)  =  H  (x,uTjx/t)  +  2  (x,u,T) ;  (2.49) 

^  ^  ~  iS 

* 

J  (x(T),T)  =  F  (X, (T),T) 

X  ~  X  “ 

2.3.2  DDP  Computational  Procedure 

(1)  Use  a  nominal  control  u(t) ,  integrate  (2.18)  for¬ 
ward  to  obtain  a  nominal  trajectory  x(t) .  Store  x(t) ,u(t) 
together  with  the  computed  cost  J(x,t)  from  (2.20). 

(2)  Integrate  (2.49)  backward  from  T  to  0  while 

_  * 

simultaneously  minimizing  H(x,u,Jj^,t)  with  respect  to  u(t) 
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*  * 

€  U  to  get  u(t)  .  Store  u  (t)  and  a(t). 

* 

(3)  Again  integrate  (2.18)  forward  using  u  (t)  and 

*  * 

also  compute  J  (x, t) .  If  the  actual  cost  change  agrees 
with  the  predicted  cost  change  computed  in  (2)  then  u*(t) 
can  be  accepted  as  a  new  nominal  control. 

(4)  If  |a(0)|  <  £.  where  C  is  some  small  positive 

•k 

number,  then  u  (t)  is  regarded  as  optimal  control.  If  not 
then  steps  (1) , (2)  and  (3)  are  reiterated. 

(5)  If  the  actual  cost  change  differs  too  much  from 
the  predicted  cost  change,  then  the  "step-size"  method  can 
be  applied. 

2.3.3  Step-Size  Adjustment 

Substituting  the  minimizing  control  in  each  iteration, 

* 

u  (t)  into  (2.18),  we  obtain 

_  _  *  — 
d  (x  +  Ax)  =  f(x  +AX/H  '  x(0)+Ax(0)  -  X  (2.50) 

dt . 

Because  Ax(0)  =  0,  the  size  of  Ax(t)  is  dued  to  the  varia- 

*  _ 

tion  in  control  aH  =  u  -  u  as  can  be  seen  from  equation 
(2.50).  One  way  to  restrict  the  size  of  x(t)  is  to 
restrict  the  interval  of  time  over  which  (2.50)  is  inte¬ 
grated.  This  is  desirable  since  we  do  not  wish  to  alter 

* 

the  size  of  u  (t)  found  in  the  minimization  process  of 

_  * 

H(x,u,J^,T)  in  step  (2)  of  the  algorithm. 

Throughout  the  derivation  process  of  the  DDP  equation 
(2.49),  we  were  under  the  assumption  that  Ax  is  small. 
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This  is  an  important  assumption,  because  if^x  is  not  small 
enough,  then  the  higher  order  terms  in  (2.33)  will  not  be 
negligible.  This  in  turn  caused  the  actual  cost  change, 
AJ,  to  deviate  too  much  from  the  predicted  cost  change,  a. 
Let  0  ^  t^  ^  T,  use  nominal  control  u(t)  from 

0  4  t  ^  tj,  then  x(tj.)=  x(t  )  and  ^x(t)=0  from  O^t^t-. 
Now,  use  the  minimization  control  u  (t)  to  integrate  (2.50) 
forward  from  t^  to  T.  If  [tj,T]  is  small,  then  x  will  be 
small  for  finite  u.  Note  that 

pi  _  • .  _  _  * 

a(x,t  )  =  I  [H(x,u,J  ,t)  -  H(x,u,J  ,t)}dt  (2.57) 

One  criteria  to  determine  whether  the  actual  cost  change 
"agrees"  with  the  predicted  cost  change  is  as  follow: 

^  £  >  C;  C  ^  0  (2.52) 

a(x,tj) 

There  is  no  strict  rule  to  govern  the  size  of  C.  It  is  up 
to  the  judgement  of  the  control  engineer  to  decide  what 
positive  number  he  should  use  for  C  for  his  particular 
problem.  Usually  C  is  set  around  0.5.  It  might  be  noted 
that  C  should  be  less  than  1  since  the  actual  cost  change 
should  not  exeed  the  predicted  cost  change.  This,  however, 
could  happen. 

If  (2.52)  is  satisfied,  then  Ax  is  small  enough,  and 
the  iteration  process  is  repeated  using  the  minimizing 


control  as  a  new  nominal  control  in  the  interval  [t^,T] . 

Usually  tj  is  originally  set  at  0.  If  (2.52)  is  not 

satisfied,  then  set  t-,  “  ^  repeat.  If  the  criteria 

^  2 

is  still  not  satisfied  using  the  new  t  ,  then  use  t*  = 

I  I 

T-tj  +  t  as  the  new  starting  point  of  the  interval  for 
integration.  The  process  is  repeated  until  the  criteria 
is  satisfied.  Figure  2  is  used  to  illustrate  this  scheme. 


Figure  2.  Half  interval  scheme  to  control  the  size  of 

Following  are  a  few  characteristics  of  the  "step  size 
adjustment" : 

(1)  The  new  nominal  trajectory  may  have  a  corner  at  t 
because  u(tj)  may  be  different  from  u  (t^) .  The  integration 
routine  used,  therefore,  must  be  capable  of  handling 
differential  equations  with  discontinuous  controls. 

(2)  If  the  minimizing  trajectory  coincides  with  the 
nominal  trajectory  during  the  latter  portion  of  the  interval 
but  the  nominal  trajectory  is  non-optimal  in  the  earlier 
portion,  two  steps  must  be  taken;  First,  a(t)  must  be 
monitered  while  performing  the  backward  integration  in  step 
(2)  of  the  routine, note  the  time  t  =  tgff  when  a(t)>0  or 
when  a(t)  is  equal  to  or  greater  than  a  small  positive 
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constant.  Then  used  instead  of  [0,T]. 

(3)  Since  computer  is  used  in  the  routine,  the 
quantized  requirement  may  eventually  conflict  with  the 
repetition  in  halving  of  the  interval  [0,T],  i.e.,[t^ir] 
may  eventually  be  smaller  than  one  quantized  step.  This 
difficulty  may  be  remedied  by  either  adjusting  C  or  use 
a  smaller  quantization  for  the  integration. 

2.4  Gradient  Projection  for  Outer  Optimization 

Two  basic  ideas  are  covered  in  this  section.  First 
we  must  see  how  the  function-space  gradient  of  the  minimum 
cost  with  respect  to  variations  in  the  maximizing  control 
y(t)  can  be  computed.  Second  the  fact  that  the  search 
direction  must  lead  us  to  a  new  feasible  point  and  yet 
increases  the  minimum  cost  as  much  as  possible  must  be  con¬ 
sidered. 

2.4.1  Gradient  Calculation 
We  are  given  J{u*(v),v). 

To  find  the  change  in  J  due  to  a  variation  in  v,  we  have 

* 

dJ  =  3  J  +  ^  ^  J 

dy  9y  ^  ^  U 

If  the  minimum  obtained  from  the  inner  optimization 

process  is  not  on  the  boundary,  then  d  j  evaluated  at  the 

a  u 

extremal  would  b.s  equal  to  zero.  In  tKis  case  then  the 
gradient  of  J  with  respect  to  a  variation  in  y  would  be 
qual  to  ^  J  . 

az 
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In  general,  the  first-order  necessary  conditions  that 

* 

u  (t)  minimizes  J(u,v)  for  any  given  v(t)  are 

*  * 

J  (u,v) 

•  * 

X  (t) 

•  * 

J 

X 

^  2u  =  °  (2.53d) 

*  * 

all  the  partial  derivatives  are  evaluated  at  (x  (t) , J 

X  ' 

u*  (t)  ,v(t) )  . 

Consider  equation  (2.53d),  we  anticipate  that  the 

*  *  *  * 

previously  optimal  quantities  u(t)  ,x  (t)  ,J,  Jjj(t)  will  have 

some  variations  with  a  small  variation  in  the  given  v(t) 

* 

designated  Av(t).  We  shall  call  these  variations  A u  (t), 

*  A  * 

AX  (t)  ,  A  J  ,  and  AJ^(t)  respectively.  Expanding  (2.53d) 
to  first  order  and  subtracting  out  all  nominal  quantities, 
we  get 

■  A2S*  *  •  Au* 

+  -AV  =  0  (2.54) 

where  care  must  be  taken  in  the  definition  of  the  above 


*  PT  *  * 

=  F(x  (T))  +  J  L(x  (t),u  (t)  ,v(t)  ,t)dt  (2.53) 


f(x*(t),u  (t) ,  v(t),t);  X  (0)  =  X  (2.53b) 


“x'  '^x  =  ^x^^  (T),T) 


(2.53c) 
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partial  derivatives  to  ensure  that  all  the  matrix  and  tensor 


operations  in  equation  (2,54)  are  compatible. 

* 

Moreover  can  be  approximated  to  the  first  order 

*  *  — 

by  J  .  AX  ,  equation  (2.54)  then  becomes 

XX  ~ 


* 


.g  +  f  .J 
-^x  — u 


Ax  +  (H 


'uu  +  -Suu)  -Au 


+  .  Av  =  0 


(2.55) 


Solving  (2.55)  for  the  change  in  u  with  respect  to  a 
small  variation  in  v  we  obtain 


du 


(H  +  JLC 
uu 


■^uu^ 


-1 


I(H 


ux 


T 

+  ^  *2, 


+  f  .J  )dx* 
ux  -u 


+  H 


uv  ^  *2uv] 


(2.56) 


All  the  terms  on  the  right  hand  side  of  equation  (2.56) 

contain  second  order  partial  derivatives.  An  error  analysis 

similar  to  the  one  given  in  section  2.3.1  can  be  given  to 

show  that  ^2*  will  be  of  second  order  as  compared  to  the 
dv 

variation  av  andAJ.  Since  we  are  dealing  with  a  first 

* 

order  calculation/  du  can  be  neglected.  In  general  then 

3v 

the  gradient  of  J  with  respect  to  a  variation  in  v  approxi¬ 
mated  to  the  first  order  would  be  equal  to  d_J.  It  is 

^'L 

well  known  from  calculus  of  variation  that  at  any  instant 
t,  S  J  is  equal  to  H  . 


2.4.2  Gradient  Projection 


I 

I 

I 

I 

i 

I 

I  • 

I 


We  are  now  dealing  with  the  problem 


Max  [Min  J]  =  Max  J 
V  u  V 


s.t.  2(x,u,v,t)  ^  0  (2.55) 

McFarland  used  the  gradient  as  the  search  direction  for  his 
algorithm.  With  the  presence  of  the  state  and  control  con¬ 
straints,  however,  the  gradient  direction  may  lead  to  an 
infeasible  point  for  the  next  iteration.  It  is  obvious, 
therefore,  that  some  adjustments  will  have  to  be  made  to 
counter  this  drawback.  The  most  well  known  method  in  non¬ 
linear  programming  to  handle  this  situation  is  the  so  called 
"projection  method".  In  essence,  linear  constraints,  or 
linearized  constraints,  form  a  linear  manifold  (defined  by 
the  region  formed  by  the  intersection  of  the  constraints) . 


The  gradient  direction  can  then  be  projected  onto  this 

T 


manifold  to  produce  a  search  direction  s  such  that  .s>0 


so  that  the  movement  in  the  direction  s  will  cause  an 


increase  in  the  functional  J  at  a  new  feasible  point. 


Let  A  be  the  g  x  p  matrix  of  active  constraints. 

q 


Actually  Ag  will  depend  on  v,  this  dependence,  however,  will 


be  suppressed  here  to  save  space. 


a_2i  ' . . 

— 

ail  • • • •  ®lp 

a  vj^ 

^Vp 

^  ... 

• 

••••  ih 

P_ 

T  •  •  •  •  _ 

ql  qp 

*  Mi 

(2.56: 


We  note  here  that  if  the  constraints  are  linear  with  respect 
to  the  controls  which  is  usually  the  case  in  the  pursuit  and 
evasion  problem,  then  the  elements  a^^ ^  '  s  of  the  matrix  Ag 
are  just  the  eoefficients  of  the  control  variables  in  the 
active  constraints. 


Figure  3  Gradient  projection  method  applied  at  an  active 
constraint.  If  the  constraint  is  nonlinear  may  be 


infeasible. 


If  the  active  constraints  are  linear,  the  search 
direction  £  will  lie  along  the  constraints  themselves. 

If  not,  then  the  search  direction  will  lie  along  the  hyper¬ 
planes  tangent  to  the  constraints  at  v^^^ where  the  super- 

(k+1) 

cript  (k)  indicates  the  nximber  of  iteration.  If  v 
proves  to  be  infeasible  as  shown  in  figure  3,  a  restoration 
phase  will  be  used  to  move  to  the  closest  feasible  point. 
Since  the  problems  we  shall  be  dealing  with  later  on  con¬ 
tain  constraints  which  are  linear  with  respect  to  the  con¬ 
trols,  we  shall  be  concerned  only  with  such  problem  from 
hereon.  For  this  class  of  problem,  at  the  end  of  each 
iteration,  we  shall  always  end  up  at  a  feasible  point. 

To  compute  the  projection,  let  us  called  the  linear 
manifold  formed  by  the  active  constraints  M.  Assuming 
regularity  of  the  active  constraints,  Ag  is  a  matrix  of  rank 
q  <  p.  Since  s  must  lie  in  c/^  we  must  have  Aq.£  =  0. 

Using  the  projection  theorem  in  functional  analysis,  the 

* 

gradient  J  can  be  decomposed  into  two  parts  as  follows : 


=  s  +  A" 


(2.57) 


where  s  £ 


and  A 


^  _L 


Multiply  (2.57)  throughout  by  A  and  use  the  fact  that 

q 

A  .s  =  0  we  have 

q  ” 


A_.  s 

q  - 


A„.  J* 
q  u 


“  <Vq^ 


=  0 


(2.58) 
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(2.59) 


From  which  we  get 

aubstitue  this  back  into  (2.58)  and  manipulate  then 

s  =  CI-A^CAqA^j-l  Aql  .  J*  (2.60) 

The  matrix 

P  =  [I-a'^(A  a  A^)"^  a  ]  (2.61) 

q  q  ^  ^  q 

is  called  the  projection  matrix  or  the  projection  operator 
on  the  vector  J*  with  respect  to  the  subspace  oi/^  .  The 
outer  optimization  terminates  when  ||  s  Ij  is  arbitrarily 
small  and  all  >  0  where  are  components  of  ^ 

computed  from  equation  (2.59). 

2.5  Algorithm  Steps 

As  mentioned  before,  we  shall  cover  the  algorithm  steps 
required  only  for  the  maxmin  operation.  The  minmax  operation 
is  similar  with  the  interchange  in  controls  for  the  inner  and 
the  outer  optimization  and  the  appropriate  change  in  signs 
in  the  search  directions. 

Starting  Procedure 

(1)  Select  a  nominal  control  v(t) ,  VQ(t),  by  suitable 
logic  using  some  physical  insights  or  whatever  is  readily 
available. 

(2)  Calculate  all  the  local  minima  of  J(u,2^)  and  rank 


them  in  ascending  order 


.V  .  . C  '  .Vo> 


where  u^  =  locally  minimizing  control  i  =  l,2,....,n 
Following  steps  applied  for  the  k~  and  (k+1) — 
iteration  (k  =  0, 

(3)  Calculate  and  its  norm 

\  -  k 

where  =  T  j‘^’’'(t).J,  (t)at  (2.62) 

Hv  Jo  “k  Sk 


(4)  If  I  j'  I  <  a  small  positive  constant,  exit 

a  saddle  point  is^located  at  ,V]^)  .  If  not  continue. 


(5)  Find  search  direction  Sj^  using 

s  =  P  . 

-k  k  vjk 


(2.63) 


where  »  I-A^  (A  A^  )"^  A^ 
^  ^k  \ 


(6)  Calculate 


11  £,,11  =  ^(t).s„(t)dt 

0  =  (A  A**^  )"^A  . 

/— k  qjc  qk  ‘3k  Hk 


(2.64) 


(2.65) 


(2.66) 


(7)  If  i|s,,||<g  ,  a  small  positive  constant,  and  all 

0  where  Py,  are  elements  of  the  multiplier  vector  y  , 
i  i 

exit  a  solution  is  located  on  the  boundary  of  the  problem. 

If  IlSklj  <  £-2  ^i  ^  corresponding 

constraint  and  go  to  (5).  If  lls,,||  continue. 

(8)  Form  new  control 


(2.67) 


where  is  a  suitable  stepsize  logic. 

If  V,  (t)  activates  another  constraint,  then  that 
“k+1 


constraint  must  be  added  to  the  matrix  A 

5k+l 

(9)  Calculate  all  the  new  local  minima  of  J 


k+1  k+1 

^k+1^ 

(10) 

If 

/ 

(u 

k+1 

k+1 

(2)  (2) 

J,  ,  (u  ,v  ) , 
k+1  -k+1 '-k+1 


k+1 


(n) 

•J.  ,  ) 

J'+l  k+1  k+1 


(1) 

)  '<  J  )  ;  i  .=^  1 

k+1  ^+1  J^+1 


(2.68) 


one  of  the  previously  higher  minima  has  over  taken  the  here¬ 
tofore  mincost.  The  crossover  point  has  been  overshot,  go 
to  (11) .  If  not  increase  the  number  of  iteration  by  one,  go 
to  (3) ,  and  reiterate. 

(11)  Find  the  cross-over  point  where 


_(!) ,  , 
k+1  "T^+l  "J«+l 


a'"’ 

k+1 


(IJ 


(2.69) 


where  i  is  the  index  when  the  inequality  in  (2.68)  is 

true. 

The  resulting  that  satisfies  (2.69)  is  the  required 


maxmin  solution. 


CHAPTER  3 


LINEAR  QUADRATIC  INTERCEPT  PROBLEM 

In  this  chapter,  a  general  linear-quadratic  differential 
game  will  be  formulated  and  reduced  into  a  more  simple  form. 
Analytical  solutions  will  be  presented  in  the  non-cons trained 
case  to  illustrate  the  difference  in  the  level  of  difficult¬ 
ies  between  the  cases  when  we  have  and  when  we  do  not  have 
the  assumption  of  the  existence  of  a  saddle  point.  A  linear- 
quadratic  pursuit  and  evasion  differential  game  will  be  in¬ 
vestigated.  The  case  without  any  control  constraint  will  be 
solved  analytically.  Through  a  simple  illustrative  example, 
physical  outcomes  corresponding  to  parameters  of  the  problem 
will  be  investigated.  The  case  with  control  constraints 
cannot  be  solved  analytically.  Two  numerical  solutions  will 
be  offered.  In  the  first  one  an  indirect  approach  with  the 
assumption  of  existence  of  a  saddle  point  and  direct  appli¬ 
cation  of  DDP  similar  to  the  algorithm  used  by  Neeland  and 
later  by  Jarmark  will  be  used.  In  the  second  one  the 
algorithm  developed  in  chapter  2  will  be  used  to  solve  the 
same  problem  without  the  assumption  of  a  saddle  point. 

All  the  problems  considered  in  this  chapter  will  be  assumed 
to  have  perfect  informations. 

3.1  Formulation  of  Linear-Quadratic  Differential  Game 

Generally,  a  linear-quadratic  game  will  have  the 
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following  cost  functional: 

J(u,v)  =  1/2  ^’’(T)c'*^Ci(T)+l/2  [u'*^(t)Qj^u(t)-v(t)Q2V(t)]dt 

where  =  n  x  1  state  vector  (3.1) 

C  =  r  X  n  terminal  weighting  matrix  r  ^  n 

u  =  m  X  1  control  vector  of  player  U 

V  =  p  X  1  control  vector  of  player  V 

Qi  Q-  =  m  X  m  and  p  x  p  positive  definite  matrices 
f  z 

T 

Note  that  C^C  is  at  least  positive  semidefinite  and  that  any 

positive  semidefinite  matrix  may  be  expressed  as  the  product 
T 

C  C. 

The  state  vector  y(t)  is  driven  by  the  following  dyna¬ 
mic  and  initial  condition 

il(t)  =  F(t)  Y(t)  +  GjL(t)u(t)  +G2{t)v(t);  y.(0)  = 

where  P(t)  =n  xn  system  dynamic  matrix  (3.2) 

G  (t) ,  G5(t)  =  n  X  m  and  n  x  p  system  input  distri- 

1  ^ 

bution  matrices 

The  state  vector  jr  with  dimension  n  can  be  reduced  to 

the  more  convenient  and  often  more  meaningful  "reduced  state 

vector"  X  with  dimension  r  ^  n.  Define 

x(t)  =  C  $(T,t)  i(t)  (3.3) 

where  ^(T,t)  is  the  state  transition  matrix  satisfying 

^(T,t)  =  -  §(T,t)  F(t);  $(T,T)  =  I  (3.4) 

Differentiating  (3.3)  we  get 
•  *  • 

X(t)  =  C$(T,t)  x(t)  +  C  #(T,t)  jr(t) 

»  -  C#(T,t)  F(t)^(t)  +  C  $(T,t)  F(t)  ^(t) 

+  C5(T,t)  G,  (t)u(t)  +  C^(T,t)  G,(t)v(t) 


where  G^(t)  =  C  ^  (T,  t)Gj^  (t) 

GjCt)  =  C§(T,t)G2(t) 

Note  that  x(0)  = -;,  C  $ (T, 0)^(0)  = 

X(T)  =  C^(T,T)i^(T)  =  C  jr(T) 

Thus,  the  cost  function  and  the  dyneunic  can  be  rewritten  as 
J(u,v)  =  1/2  x*^ (T ) X (T ) +1/2  J  (u'^'QiH  "  (3.6) 

and 


x(t)  =  Gj^(t)u(t)  +  G2(t)v(t);  x(0)  =  ^  (3.7) 

Since  no  assumption  is  made  in  the  derivation  of  this  reduced 
state  form,  it  is  as  general  as  the  form  represented  by 
equations  (3.1)  and  (3.2).  The  state  x,  however,  is  a  more 
meaningful  measure  of  the  game  than  ^  since  x  will  indicate 
what  the  game  will  end  up  with  if  no  furthur  control  is 
applied  by  any  player. 

3.2  Analytical  Closed-Loop  Solution 

Analyses  of  the  problem  represented  by  equations  (3.6) 
and  (3.7)  will  be  presented  in  this  section:  one  with  the 
assumption  of  existence  of  a  saddle  point  and  another  without 
such  assumption. 

3.2.1  With  Assumption  that  Saddle  Point  Exist 
The  Hamiltonian  of  the  problem  is 

H  =  1/2  (u*^  H  “  ^2  ^  (3.8) 

The  costate  equation  is 

X  *  -  dH  =  0  X  =  constant  vector 


Then 

A(t)  =  X  (T)  =  x(T)  (3.9) 
The  last  equality  is  obtained  from  the  transversality  condi¬ 
tion. 


Since  we  are  dealing  with  the  unconstrained  problem, 

Pontryagin  optimality  principle  states  that 

^  H  =  Qi  H*  +  ^  =  0  (3.10) 

h  u 

and 

Aji  =  -  QoX  +  G,  X  =  0  (3.11) 

i  V  ^  ^ 


Note  that  these  two  equations  can  be  used  simultaneously 
because  we  have  the  assumption  of  existence  of  a  saddle  point. 

t 

Also  the  positive  definitness  of  and  Q2  guarantee  that  u 

* 

and  V  is  rhe  minimum  and  the  maximum  respectively.  Thus 


*  -1 

u  =  “  Qi  x(T)  (3.12) 

*  1  O’ 

V  =  G2  X(T)  (3.13) 


*  * 

Substitue  u  and  v  back  into  (3.7),  integrate  from  t  to  T, 
and  solve  for  x(T)  we  get 

x(T)  =  W(t)  x(t)  (3.14) 


where 


W(t) 


fT  T  fT  _i  T  -1 


(3.15) 


Therefore, 

* 


u  (t) 


the  saddle  point  solution  is 
=  -  Q“^Gj(t)  W(t)x(t) 


(3.16a) 


V  (t) 


G^(t)  W(t)  x(t) 


(3.16b) 


J(u*v*)  =  1/2  x^(t)  W(t)  x(t)  (3.16c) 

3.2.2  Without  Asstunption  that  Saddle  Point  Exist 

In  this  section  the  minmax  and  the  maxitiin  solution  must 
be  solved  separately.  With  the  existence  of  a  saddle  point 
assumption,  the  condition  that  the  solution  exists  is  that 
W(t)  must  be  positive  definite  for  all  t  in  the  interval 
[o,T]  of  the  game.  We  shall  see  that  the  condition  becomes 
more  stringent  without  the  saddle  point  assumption. 

MINMAX  SOLUTION; 

First' we  look  for  v  that  maximizes  J(u,v)  with  an 
arbitrary  u.  For  this  purpose,  equation  (3.13)  is  still 
valid  and  we  have 

v(t)  =  Q2^G2X{T)  (3.17) 

A 

Substitue  v  into  the  dynamic  equation  (3.7),  integrate 
from  t  to  T  and  solve  for  x(T)  we  get 


JT 

G^('r)  u('T)dr],  0  <  t  <  T  (3.18) 


where 


P(t)  =  II 


IT  -1  T 

G2('»’)  Q2  G2  {r)  dr]  ^ 


(3.19) 


For  P(t)  to  exist  we  must  have 


II- 


Q 


-1 

2 


(T)  dr]  >  0 


for  all  0  4  t  <  T 


(3.20) 


JmrniJ 


then 


v(t)  =  Q2^G^(t)P(t)  [x(t)  +  G^('r)u('r)d-r]  (3.21) 


Note  that  v(t)  is  globally  optimal  as  long  as  (3.20)  holds. 

Since  the  starting  time  t  =  0  is  arbitrary,  rewrite  the 
cost  functional  in  equation  (3.6)  to  complete  the  game  from 
time  t  and  also  substituting  (3.19)  for  v  and  manipulate  we 
get 

A  TT 

J(u,v)  =  1/2  x^(T)  p"^x(T)  +  1/2  J  u^’q^u  dr  (3.22) 


using  (3.18)  in  (3.22)  then 


A  ^  TT  fT  Ip 

^/2[x  +  1  G  udr]  P[x  +  G  u  dr]  +  1/2  I  u  Q  udr 

Jt  1  Jt  ^  Jt~  1 

(3.23) 

Let  u(>)  varies  by  a  small  amount  Au(r)  ,  expand  (3.23)  to 
first  order  in  Taylor's  series  and  subtract  out  nominal 
terms 


A  J  (u,  v) 


T  fT  T  T  , 

^  .([x(t)+  G^udj]  p(t)G^(r)  +  U  (r)Q^}Au('r) 

(3.24) 


Since  Au(r)  is  arbitrary,  the  variation  of  the  cost  function¬ 
al  is  zero  only  when  the  term  { - J  =  0  in  equation 

(3.24)  hence 


u  (r)  =  -q"^  G^  (t)  P(t)  Ix(t)  + 

1 


fT 

J  Gj^ud/]  , 


t<T4T  (3.25) 
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To  solve  for  u  we  multiply  each  side  by  (T)  and  integrate 
from  t  to  T 


fT  PT  -1  PT 

G  (*»•)  u(r)dr  =  -  G  (r)Q  G;(r)P(t)  [x(t)+  I  G-ud^ld-r 

Jt-*-  J  t  1  1  -L  ~  J  t  1— 


(3.26) 


Manipulating  we  get 


1 


T  f  T  PT 

G^u(T)dT  =  -tl+J  G^Q"^G^dr.P(t)]"^.  G^Q“^G^dT  . 

fT  _i  T 

^  Jt  Vl  ^ 


P(t)  .x(t) 


Let  A 


(3.27) 

(3.28) 


Since  A  is  positive  semi  definite,  I  +  A  is  positive  definite 
for  all  t  €[0,T].  Therefore  the  above  inverse  in  (3.27) 
always  exists.  Add  x(t)  to  both  side  of  (3.  27) 


PT 

x(t)  +  J  G^u('r)dT 


x(t)  -  II+A)”^  A  .  x(t) 
{l-(I  +  A]"^a}  x(t) 
n+A]"^  x(t) 


Multiplying  both  side  by  P(t)  we  have 


(3.29) 


P(t)  [x(t) 

Let  A  = 


G^u('r)dr]  + 


-1  T 

Vi  ®1 


P(t)  [1+ 


fT  -1  T 

J  t  ^1°1 


d'i’P(t)  ]"^x(t) 

(3.30) 

(3.31) 


and  B 


-1  T 

G  Q  G  d  r 


(3.32) 


then  the  right  hand  side  of  (3.30)  becomes 


-1  ^  -  -1  * 

RHS  =  II-B]  {  I  +  A  ri-B]  }  x{t) 

=  [I-B+A]"^  x(t) 

Therefore 

fT  fT  _i  T  fT  -It  -1 

P(t)[x(t)+  J  Gj^u(T)dr]  =  [1+  j  ^2^2 

=  W(t)x(t)  '  (3.33) 

Substitue  back  into  (3.25)  yeilds 

u(T)  =  G^(T)  W(t)  x(t)  t<r<T  (3.34) 

A 

Substituting  u(T)  into  (3.20)  and  perform  matrix  manipula¬ 
tion  similar  to  the  one  above 

v(t)  =  q”^  G^  (t)  W(t)  x(t)  (3.35) 

Comparing  (3.17)  with  (3.35)  we  have 

x(T)  =  W(t)  x(t)  (3.36) 

Using  (3.34) , (3.35)  and  (3.36)  in  (3.22)  we  get 

J(u,v)  =  1/2  x’'(t)  W(t)x(t)  (3.37) 

MAXMIN  SOLUTION; 

In  this,  case  we  first  look  for  u  =  Min  J(u,v). 


Equation  (3.  2)  can  be  used  to  start  off 


-1 


u(t)  =  G^(t)  x(T) 


(3.38) 


Substitue  u(t)  into  (3.7)  integrate  and  solve  for  x(T) 


x(T)  = 


i: 


M(t)  (x(t)  +  G  ('r)v  (r)dT] 

2  “ 


(3.39) 


where  M  =. 


(3.40) 


hence  u(t) 


fT  -1  T  -1 

tl  +  J  Gj^('r)  Qjl  Gj^  (T”)] 

=  -Qi^G][(t)M  (t)  [x(t)  +  G2('r)v(r)dT]  (3.41) 


Note  that  M  always  exists  because  a:>0  and  I+A>0  for  all 
0  <  t  <  T.  Then  (3.42) 


AJ(v,u)  *  {tx(t)  +  G2vdjr  l’’  M(t)G2('>’)  -  v^(7’)Q2}' 


Av('r’)dT 

for  arbitrary  A v(r)  again  {....  }  must  be  zero  for  the 
variation  to  be  zero,  hence 


.-l.,T 


j: 


v(T)  =  Q2  G2(^)  M  (t)  [x(t)  +  1  G2  vdi  ],  t<T4T  (3.43) 


Premultiply  (3.43)  by  G2('*’)  and  integrate  from  t  to  T 


fT  fT  Tt 

G2(T)v(T)dT  =  G2(t)  Q2  G2(T)M(t)  [x(t)+  j  G  vd:f  JdT 


(3.44) 


Again  (3.43)  can  be  rearranged  to  give 


G2('r)v('r)dT  =  Jt  g^Q^  G2dr 


M(t)x(t) 


(3.45 


(I  -  G^Q2^G2dn-  .  M(t)]“^  =  [(M"^{t)- G2Q2^GTd'r)M(t)  ] 


=  tw"^(t)  M(t)]“^ 


=  (t)  W  (t) 


(3.46 


Therefore,  the  indicated  inverse  in  (3.45)  exists  if  W(t) 
exists  which  is  the  same  condition  for  the  existence  of  the 
saddle  point  solution.  Substitue  (3.46)  back  into  (3.45) 

fT  _1  fT  -1  T 

I  G2(T)v(r)dr  =  M  (t)W(t)  G^{y)  G2('’’)dTM(t)x(t) 

(3.47 

Add  x(t)  to  both  side,  premultiply  by  M(t) ,  and  manipulate 
to  obtain 


PT 

M(t)  Ix(t)  +  J  ^  G2(r)v(T)dr]  =  w(t)x(t) 


(3.48 


Hence 


-It  \L 

v(r)  =  Q,  G.,(T)  W(t)  x(t)  -T  t^'T'^T 


(3.49 


All  the  above  results  may  be  summarized  as  follow 
SADDLE  POINT  SOLUTION 
Optimal  controls: 


u  (t)  = 


G^(t)W(t)x(t) 


V  (t)  =  q'J’G^  it)  W(t)  x(t) 

2  2 


Optimal  Cost: 


J(u,v  )  =  1/2  x^  W(0)x 
—  —  -TO  ~o 


Condition  for  existence: 


I  + 


«i  T  -1  T  u 

GiQ^-^Gj^dT  -  62^2  ^  °  T  0  4  t  4 


MINMAX  SOLUTION 


Optimal  Controls: 


u(t)  =  -  QT^GT(t)W(t)x(t) 


V(t) 


©2  G2(t)W(t)x(t) 


Optimal  Cost: 

A  A  T 

J(u,v)  =  1/2  X  W(0)x 

“o  “O 


Condition  for  existence: 

I-p  V  0<t<T 

MAXMIN  SOLUTION 
Optimal  Controls: 

^  -IT 

u(t)  =  G^(t)W(t)x(t) 

—1  T 

V(t)  =  Q  G^(t)W(t)x(t) 

-  2  2  - 


Optimal  Cost: 


T 

J(u,v)  =  1/2  X  W{0)x 

“o  “o 


Condition  for  existence: 

fT  -IT  fT  -IT  1/ 

I  +  J  Gj^dT  -  G2Q2  G2dt>  0  JO  ^  t  <  T 


DISCUSSION: 

Following  remarks  may  be  made  about  these  solutions: 

(1)  Controls  and  costs  are  the  same  in  each  case, 
thus  a  saddle  point  does  indeed  exist  for  linear-quadratic 
differential  game.  The  optimal  cost  in  each  case  is  the 
value  of  the  game  and  neither  player  can  do  anything 
unilaterally  to  improve  his  cost. 


(2)  Conditions  for  the  existence  of  the  sadd  point 
solution  and  the  maxmin  solution  are  the  seune.  For  mimnax 
solution,  however,  the  condition  is  more  stringent.  It 
might  be  noted  that  if  the  necessary  and  sufficient  for  the 
minmax  solution  is  satified  for  a  game,  then  the  necessary 
and  sufficient  condition  for  both  the  maxmin  and  the  saddle 
point  solution  will  be  automatically  satisfied  because  the 
missing  matrix  in  the  conditions  for  the  latter  two  is  at 
least  positive  semidefinite. 

(3)  All  the  solutions  solved  for  this  problem  are  in 
closed- loop  form  which  is  indeed  more  desirable  than  the 
open-loop  solution.  It  might  be  noted,  however,  chat  linear 
quadratic  problem  is  about  the  only  type  of  differential 
game  for  which  closed-loop  solution  may  be  solved  analytic¬ 
ally.  Moreover,  as  we  shall  see  later,  if  the  constraints 
on  state  and  controls  are  added  to  the  game,  even  for  a 
'linear-quadratic  problem  a  closed-loop  solution  is  not 
always  guaranteed. 

3.3  An  Illustrative  Example  Without  Control  Constraint 

The  example  given  in  this  section  will  be  the  same  as 
the  one  considered  by  McFarland.  However,  more  extensive 
results  will  be  offered  to  gain  a  more  meaningful  insight 
into  the  problem.  Our  goal  here  is  to  show  that  in  contrary 
to  McFarland's  implication  that  the  linear-quadratic  formu¬ 
lation  of  a  pursuit-evasion  problem  is  likely  to  yield 
trivial  solution,  this  difficulty  may  be  avoided  by  careful 


examination  of  the  problem  to  avoid  any  conjugate  point  in 
the  solution.  The  incentive  to  present  the  illustrative 
example  in  this  manner  is  two  folds.  First  it  will  be 
shown  that  even  simple  unconstrained  linear-quadratic 
problem  can  have  a  meaningful  physical  realization.  Also 
in  the  next  section,  the  same  example  with  constraints  will 
be  used  to  show  that  analytical  solutions  are  not  possible 
in  such  case  and  numerical  solutions  with  and  without  the 
saddle  point  assximption  will  be  offered. 


Line  of 
Intercept 


El^ADER 


PURSUER 


A  simple  planar  intercept  problem  is  diagrammed  in 
figure  5.  and  Xp  are  lateral  positions  of  each  players 
who  move  towards  one  another  with  constant  forward  veloci¬ 
ties  Sg  and  Sp  respectively.  The  interception  time  or  the 

time  when  the  players  pass  each  other  is  T  =  L  seconds. 

Se  +  Sp 

Each  player  controls  his  lateral  position  by  using  his  res¬ 
pective  lateral  velocities,  u(t)  and  v(t) ,  as  control  inputs 
Thus  we  have 

• 

X  =  u, 

P 

X  =  V, 

e 

Generally,  x  and  x  can  be  used  as  the  states  of  the 
P  ® 

problem.  But  as  we  shall  see,  it  is  more  convenient  and 

more  meaningful  to  define  a  state  x(t)  as 

x(t)  =  X  (t)  -  X  (t)  (3.53) 

e  p 


X  (0)  =0 

P 

(3.52) 

X  (0)  =  0 


Thus  x(t)  may  be  interpreted  as  the  lateral  miss  distance 
if  both  players  use  no  further  control  from  time  t  until 
the  end  of  the  game.  Using  (3.53)  in  (3.52)  the  dynamic 
equation  is 

x(t)  =  v(t)  -  u(t),  x(0)  =  0  (3.54) 

The  pursuer  is  trying  to  minimize  the  miss  lateral 
distance  at  the  time  of  interecption  without  using  excessive 
control  energy  while  the  evader  is  trying  to  maximize  the 
same  miss  lateral  distance  while  using  reasonable  control. 
Therefore,  we  have  a  two  person  zero-sum  differential  game 


with  the  cost  function: 


J(u,v)  =  1/2  x2(T)  +  1/2  J  (q^u^  -  q2V^)dt  (3.55) 

Using  the  results  of  section  (3.2),  the  solutions  are 
u(t)  =  u(t)  =  u*(t)  =  W(t)  x(t)  (3.56a) 

v{t)  =  v(t)  =  V  (t)  =  W(t)  x(t)  (3.56b) 

where 


-1 

W(t)  =  (1  +  T-t  -  T-t) 


qi  q2 


(3.57) 


The  necessary  and  sufficient  condition  for  the  minmax  solu¬ 
tion  is 

1  -  Tj^t  ^  0  for  0  «  t  4:  T  (3.58) 

^2 

This  is  the  same  as  the  condition  q2>  T.  Substituting  (3.56) 
into  (3.54)  it  is  apparent  that  the  only  stable  solution  for 
the  resulting  differential  equation  is  x(t)  =  0  for  0^t4T. 
Therefore 

u(t)  =  u(t)  =  u  (t)  =  0  (3.59a) 

v(t)  =  v(t)  =  v*(t)  =  0  (3.59b) 

This  solution  makes  sense  in  the  pursuer  viewpoint 
since  the  initial  lateral  miss  distance  is  zero  and  since 
the  evader  is  not  making  any  move,  the  pursuer  then  can 
hold  his  position  until  he  runs  into  the  evader  at  the  time 
of  interception.  From  the  evader's  point  of  view,  however, 
this  is  indeed  a  strange  solution  since  we  would  expect 
him  to  do  something  to  avoid  collision  with  the  pursuer. 


* 
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This  strange  result  occurs  because  (3.58)  calls  for  too 
much  weight  on  control  v(t)  otherwise  we  would  have 

the  maxmin  solution  v(t) - ►  .  However,  we  only  have 

this  dilemma  for  x(0)  =  0  which  is  the  only  case  McFarland 


considered.  If  we  let  x(0)  = 

are 

fk  * 

u(t)  =  u(t)  =  u  (t)  = 

fi  ^  * 

v(t)  =  v(t)  =  V  (t)  = 

with  the  value  of  the  game 

A 

J(u,v)  = 

The  interpretation  of  this 
as  follow: 

Case  1:  =  <32  * 

/V  * 

then  u  =  u  =  u  = 

A  ^  ^ 

V  =  V  =  V  = 

A  ^ 

and  J(u,v) 


X  F  0,  then  the  solutions 
o 


q 

2  X  (3.60a) 


_  X  (3.60b) 

‘31*32  " 


q  q,  2 

1  2  X  (3.61) 


result  can  be  summarized 


1  X 

o 


1  X 

q  ° 

1/2  x| 


In  this  case  the  evader  cannot  get  further  away  from  the 
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initial  lateral  displacement  if  the  evader  is  using  his 
optimal  control. 

Case  2;  >  ^3 

then  from  (3.60)  lu(t)|  <  |v(t)|  .  Physically,  this  makes 

sense  because  in  this  case  since  the  pursuer  is  putting  more 
weight  on  his  control,  he  is  penalized  more  than  the  evader 
if  both  players  use  the  same  amount  of  control.  Therefore, 
the  pursuer  is  induced  to  use  less  control  than  the  evader. 
Also  since 

g  q 

x(T)  =  ^  ^ _  X  (3.62) 

x(T)  is  larger  than  x^  in  this  case.  Moreover,  the  larger 
is  in  relative  to  q2,  the  larger  x(T)  will  be.  Thus  the 
pursuer  can  escape  if  q^^  is  large  enough  and  the  evader  is 
restricted  to  use  a  small  amount  of  control.  It  is  inter¬ 
esting  to  note  that,  if  the  necessary  and  sufficient  condi¬ 
tion  in  equation  (3.58)  is  satisfied,  x(T)  cannot  be  nega¬ 
tive  with  respect  to  x^. 

Case  3;  q^  <  q^ 

then  lu(t)|  >  |v(t)| 

the  evader  is  induced  to  use  less  quantity  of  control  in  this 

case  because  more  weight  is  being  put  on  his  control.  From 

(3.62),  x(T)  is  smaller  than  x  in  this  case  and  the  pursuer 

o 

can  get  closer  to  the  evader  than  the  initial  lateral  dis¬ 
placement.  Interception  can  be  made  if  q.,.  is  large  enough. 


The  magnitude  of  q2  required  for  interception  depends  upon 
the  radius  of  interception,  the  magnitude  of  Xq,  the  time 
of  interception  T,  and  the  weight  q^^  on  the  pursuer  control. 

This  example  clearly  illustrates  that  even  a  simple 
unconstrained  linear-quadratic  problem  can  be  meaningful 
if  it  is  set  up  carefully  to  avoid  the  conjugate  point 
difficulty. 

3.4  Linear  Quadratic  Problem  with  Hard  Limit  on  Controls 


In  this  section,  we  shall  use  the  same  illustrative 
problem  described  in  section  3.3.  The  cost  function  and 
the  dynautiic  equation  are  repeated  here  for  convenience . 


Cost;  J{u,v)  =  1/2  X  (T)  +  1/2 


(q^u^  -  q^'v^)dt  (3.63) 


Dynamic;  x  =  v(t)  -u(t),  x(0)  =  x^ 

In  addition  we  add  the  following  constraints 

luK  1  ,  Iv  l<  1 


(3.64) 


(3.65) 


This  problem  can  be  solved  analytically,  if  the  para¬ 
meters  are  set  up  in  such  a  way  that  u  and  v  do  not  exceed 
their  limits.  In  that  case  the  results  are  of  course  the 
same  as  those  presented  in  the  last  section.  The  actual 
derivation  for  the  conditions  and  the  solutions  for  this 
problem  such  that  the  optimal  controls  lie  within  the  control 
boundaries  will  be  taken  up  in  appendix  B.  Also  in  appendix 
B,  we  shall  demonstrate  the  equivalency  between  the  closed 
loop  and  the  open  loop  solutions  for  this  specific  problem. 

However,  this  problem  in  general  cannot  be  solved 
analytically.  To  illustrate  this  point,  let  us  try  to  find 
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the  minmax  solution.  The 


Hamiltonian  of  the  problem  is 


H  =  l/liqjVL^  -  q2^^)  +  X  (v-u) 


and  the  costate  equation  is 


.  X  =  0 

dx 


(3.66) 


(3.67) 


Therefore,  X  (t)  =  constant 

From  transversality  condition:  X(T)  =  x(T) 


(3.68) 

(3.69) 


Thus 


(t)  =  A  (T)  =  x(T)  Y  0  <  t  < 


(3.70) 


Now,  for  an  arbitrary  u,  maximize  J  with  respect  to  v. 
Pontryagin  Maximal  Principle  states  that 

H(x*  u,  V*  t)  >  H(x,u,v,X,t)  (3.71) 

where  *  indicates  optimal  quantities.  Consider  the  terms 


in  the  Hamiltonian  which  contain  v,  we  have 

-1/2  q2V^  +  Xv 

if  q2<Af  it  can  be  shown  that 

* 

V  =  sgn  X  =  sgn  x(T) 
substituting  (3.73)  back  into  (3.64)  yields 

X  =  sgn  x(T)  -  u 


(3.72) 


(3.73) 


(3.74) 


integrate  from  0  to  T  and  rearrange 


x(T)  =  Xq  +  T  .  sgn 


x(T)  - 


udt  (3.75) 


It  is  not  clear  that  x(T)  and  or  sgn  x(T)  can  be  solved 
from  (3.75)  unless  all  parameters  including  the  arbitrary 
u  are  assigned  numerical  values.  In  fact  equation  (3.75)  is 
transcendental . 
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This  problem  then  must  be  solved  by  numerical  methods. 

In  the  next  section,  two  ntunerical  solutions  will  be  off¬ 
ered,  one  with  the  assumption  of  a  saddle  point  DDP  is  used 

*  * 

directly  to  simultaneously  solve  for  u  and  v,  and  the  other 
without  the  assumption  of  a  saddle  point  the  algorithm  deve¬ 
loped  in  the  last  chapter  is  used  to  solve  for  the  maxmin 
and  the  minmax  solutions. 

3.5  Numerical  Solutions 

Computer  programs  using  Fortran  Language  in  conjunction 
with  the  WATFIV  compiler  are  written  to  obtain  numerical 
solutions  for  the  problem  described  in  the  last  section. 

These  programs  are  listed  in  Appendix  C.  In  the  first  pro¬ 
gram,  the  existence  of  a  saddle  point  is  assumed,  and  DDP  is 
used  to  simultaneously  solve  for  the  optimal  controls  for 
each  player.  In  the  other  two  programs,  the  minmax  and  the 
maxmin  solutions  are  searched  for  using  the  algorithm 
developed  in  Chapter  2.  As  expected  the  program  with  the 
saddle  point  assumption  contains  less  number  of  programming 
steps  than  each  of  the  other  two  programs.  We  shall  call 
the  solution  obtained  with  saddle  point  existence  assump¬ 
tion  the  saddle  point  solution,  and  the  other  two  the  min¬ 
max  solution  and  the  maxmin  solution  respectively  for  obvious 
reason.  A  large  number  of  batch  jobs  are  computed  using 
UCLA  Campus  Computing  Network's  IBM  System  360  Model  91. 

The  computation  time  for  all  three  programs  are  extremely 
fast.  The  execution  time  for  all  three  types  of  solutions 


are  essentially  the  same.  For  a  typical  set  of  parameters, 
the  execution  time  for  all  three  programs  are  approximately 
0.2  second  each  for  an  8  seconds  encountered  between  the 
two  players. 


3.5.1  Algorithm  Mechanization 

An  integration  scheme  is  needed  to  mechanized  the 
algorithm  for  the  DDP  both  in  integrating  the  state  equation 
forward  and  also  to  integrate  the  set  of  equations  (2.49) 
backward.  Since  the  structure  of  this  problem  calls  for  a 
constant  values  for  the  optimal  controls  during  the  entire 
interval  of  the  game,  simple  Euler's  scheme  of  integration 
can  be  used  to  obtain  accurate  results. 

To  mechanize  the  algorithm  on  the  computer,  discretiza¬ 
tion  must  be  made.  For  this  purpose,  the  encountered  time 
is  devided  into  64  increments.  For  a  typical  encountered 
time  of  8  seconds  then  each  increment  of  time  is  equivalent 
to  one-eighth  of  a  second. 

Even  though  the  programs  are  written  to  accommodate 
the  step-size  adjustment  described  in  section  2.5.3,  no 
step-size  adjustment  were  needed  for  the  large  set  of  para¬ 
meters  on  the  trial  runs  on  this  problem.  Equation  (2.52) 
is  satisfied  in  all  cases  of  the  trial  runs. 

Table  1,2,  and  3  are  computer  printouts  of  the  saddle 
point  solution,  the  minmax  solution,  and  the  maxmin  solution 
respectively  for  the  following  set  of  parameters; 
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L  =  180  kilofeet 

Sg  =  15  kilofeet  per  second 

Sp  *=  7.5  kilofeet  per  second 

T  =  L  =8  seconds 

The  control  limits  are  chosen  as  ten  percent  of  their  res¬ 
pective  forward  velocities 

Ivf  ^  1.5  kilofeet  per  second 
|u|  ^  .75  kilofeet  per  second 
For  the  saddle  point  solution,  both  initial  controls 
were  chosen  as  zero.  USTAR  and  VSTAR  are  the  controls  that 
minimizes  and  maximizes  respectively  the  Hamiltonian  in 
each  iteration.  Number  "1“  in  the  "step  adj"  column  indicates 
that  the  set  of  equations  (2.49)  is  integrated  backward  to 
the  time  t  =  0.  It  might  be  noted  here  that  the  further  the 
algorithm  progresses,  the  closer  the  predicted  cost  change 
in  the  column  "A(N)"  agrees  with  the  actual  cost  change  in 
the  column  "DELJ" .  For  this  set  of  parameters,  the  saddle 
point  solution  converges  in  five  iterations  with  approximately 
0.2  second  execution  time.  Both  optimal  controls  are  satu¬ 
rated  for  this  set  of  parameter.  The  value  VSTAR  =  1  in  the 
first  iteration  satisfies  equation  (3.60  b)  of  the  uncon¬ 
strained  problem.  Therefore,  similar  to  optimal  control, 
the  results  here  confirmed  that  a  constrained  differential 
game  cannot  be  solved  as  an  unconstrained  differential  game 


njoo 

n/«in 


Table  2.  Typical  run  of  a  minmax  solution  for  the  specificed  set  of 


Table  3.  Typical  run  of  a  maxmin  solution  for  the  specified  set  of  parameters 


and  let  the  controls  saturated  when  and  if  the  resulting 
controls  exceed  their  limits.  These  values,  however,  can 
be  used  as  the  initial  controls  for  the  algorithm  as  illus¬ 
trated  in  the  minmax  and  the  maxmin  solutions. 

In  both  the  minmax  and  the  maxmin  solutions,  the  initial 
controls  are  computed  from  equations  (2.60a)  and  (2.60b), 
the  saturated  value  is  used  whenever  a  control  exceeds  its 
limit.  In  the  minmax  solution,  the  gradient  of  the  maximum 
cost  in  each  of  the  overall  iteration  is  always  negative. 
Similarly,  in  the  maxmin  solution,  the  gradient  of  the  mini¬ 
mum  cost  is  always  positive.  This  indicates  that  the  right 
directions  are  being  searched.  Note  also  that  the  absolute 
values  of  the  gradients  form  monotonic  decreasing  sequences 
and  thus  assure  the  convergence  property  of  the  algorithm. 

In  the  minmax  and  the  maxmin  algorithm,  DDP  is  used  for  the 
inner  optimization,  and  gradient  projection  method  is  used 
in  the  outer  or  overall  optimization.  For  this  particular 
set  of  parameters,  the  minmax  solution  converges  in  two 
overall  iterations  with  four  iterations  of  the  DDP  for  the 
first  inner  maximization  while  the  maxmin  solution  requires 
five  overall  iterations  to  converge  but  each  inner  minimiza¬ 
tion  converges  in  one  iteration  of  the  DDP.  The  total  comp¬ 
utation  time  for  both  solutions  are  again  approximately  0.2 
second  each.  Therefore,  we  can  conclude  that  there  is  no 
appreciable  difference  in  the  computation  time  of  this  pro¬ 
blem  for  all  three  types  of  solutions. 
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For  all  the  large  number  of  sets  of  parameters  run  for 
this  problem,  all  three  solutions  give  the  same  answers  for 
the  optimal  controls.  Therefore,  even  though  it  has  not 
been  vigorously  proved  analytically,  we  may  heuristically 
say  that  the  saddle  point  does  indeed  exist  for  this  type 
of  constrained  linear-quadratic  differential  game  as  con¬ 
firmed  by  our  niamerical  experiments. 

For  the  particular  set  of  parameters  shown  on  table  1 
through  3,  equal  weights  are  put  on  the  penalties  on  the 
controls  of  both  player.  In  this  case,  we  recall  that  the 
unconstrained  case  calls  for  an  equal  cunount  of  controls 
from  both  players  and  neither  the  pursuer  can  get  any  closer 
nor  the  evader  can  manuver  to  be  further  away  than  the 
initial  lateral  displacement  J^.  In  the  constrained  case, 
however,  since  the  pursuer  in  this  case  is  more  limited  in 
his  lateral  speed,  the  evader  can  use  his  superior  capabi¬ 
lity  to  get  further  away  than  the  initial  lateral  displace¬ 
ment  as  shown  by  x(T)  =  16  kilofeet  when  =  10  kilofeet 
in  this  case. 

3.5.2  Effects  of  Parameter  Variations 

Table  4  illustrates  the  effects  of  changing  the  initial 
condition  x^  with  a  fixed  set  of  other  parameters.  As 
expected,  when  the  initial  lateral  displacement  is  small, 
the  solutions  stay  within  the  boundaries  and  are  the  same 
as  those  obtained  in  the  unconstrained  case.  For  the  set 
of  parcuneters  shown  in  table  4,  the  solutions  are  the  same 


for  both  the  constrained  and  the  inconstrained  case  for 
IXqI  .75  kllofeet. 

With  larger  initial  lateral  displacement,  the  pursuer’s 
control  becomes  saturated.  The  evader  can  then  take  advant¬ 
age  of  his  superior  capability  to  obtain  larger  final  lateral 
separation  between  the  two;  whereas  we  have  noted  before  in 
the  unconstrained  case  that  for  =  q^,  which  is  the  case 
here,  neither  player  can  get  any  closer  nor  further  away 
from  each  other  than  their  initial  lateral  displacement. 
Besides  making  use  of  his  superior  capability,  the  evader 
has  another  reason  which  induces  him  to  use  more  control 
in  this  case  than  he  would  have  used  in  the  unconstrained 
case.  That  reason  is  the  fact  that  the  control  limited  by 
the  pursuer  has  introduced  a  relative  saving  in  the  cost 
function  for  the  evader. 

The  evader’s  control  becomes  saturated  when  is  only 
9  kilofeet  whereas  in  the  unconstrained  case  this  same  Xq 
would  yield  an  optimal  control  of  only  .  9  kilofeet  per 
second  for  the  evader  which  is  only  sixty  per  cent  of  his 
capability.  For  1x^1  >  9  kilofeet  both  players  use  their 
maximum  capabilities  for  their  optimal  controls.  The  lateral 
missdistance  is  6  kilofeet  greater  at  the  final  time  than 
it  was  at  the  initial  time.  This  difference  is  brought 
about  by  the  evader's  superior  capability  and  remains  the 
same  for  all  |x^|  ^  9  kilofeet. 

Table  5  and  6  show  the  effects  of  changing  the  pursuer’s 
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weighting  factor  when  the  evader's  weighting  factor 
q2  =  10  and  Xq  =  8  kilofeet  and  10  kilofeet  respectively. 

The  control  limits  in  both  cases  are  |ul  ^ .75  kilofeet  per 
second  and  |y| ^  1.5  kilofeet  per  second.  In  table  5  we  see 
that  the  pursuer's  control  is  saturated  for  12.  This  is 
not  surprising  because  with  relatively  small  q^^,  the  gain  in 
the  final  lateral  miss  distance  offset  the  penalty  of  using 
more  control  for  the  pursuer  thus  he  would  use  as  much  con¬ 
trol  as  he  possibly  could.  With  larger  q^^,  however  the 
pursuer  is  forced  to  use  less  control  than  his  limit.  The 
solutions  for  qj^>  12  in  table  5  are  the  same  as  the  uncon¬ 
strained  case  and  the  minmax  and  maxmin  solutions  converge 
in  one  iteration.  In  table  6  both  players  are  forced  to 
use  their  respective  maximum  control  because  of  the  relatively 

large  value  of  x  . 

o 

In  table  7,  the  values  of  x^,  q^^,  and  q2  are  doubled 
when  compared  to  the  scime  parameters  in  table  5.  Close 
examimation  reveals  that  the  solutions  in  both  table  5  and 
table  7  follows  the  seune  relative  pattern  even  though  the 
absolute  magnitude  of  the  unsaturated  controls  for  both 
players  are  lower  in  table  7  because  of  greater  penalties 
for  the  control  inputs. 

Table  8  and  9  demonstrate  the  effects  of  chianging  the 
control  limits.  In  table  8,  both  players  have  equal  capa¬ 
bilities,  the  optimal  controls  in  this  case  then  depend 
upon  the  relative  values  of  the  penalty  weights  qj^  and  q2 


and  the  initial  lateral  displacement  x^.  In  table  9,  the 
pursuer's  lateral  capability  exceeds  that  of  the  evader. 

The  limits  on  control  inputs  are  interchanged  if  compared 
to  those  in  table  1,  the  pattern  of  the  solutions,  however, 
is  consistent  if  the  limits  interchange  is  taken  into 
account. 


3.5.3  Discussion  on  the  Algorithms 

Before  we  close  this  chapter,  several  points  can  be 
made  on  the  algorithms  used  in  this  section. 

(1)  All  three  types  of  solution  are  the  same  for  each 
particular  set  of  parameters.  Therefore,  we  can  conclude 
that  for  linear-quadratic  problem  saddle  point  exists  for 
both  the  constrained  and  the  unconstrained  cases. 

(2)  The  saddle  point  solution  takes  less  programming 
steps  than  each  of  the  minmax  solution  and  the  maxmin 
solution. 

(3)  Computation  times  are  approximately  the  same  for 
all  types  of  solution.  All  three  types  converge  very 
rapidly  in  most  cases. 

(4)  The  saddle  point  solution  uses  u(t)  =  0  and  v(t) 

-  0  as  initial  controls  whereas  the  minmax  and  the  maxmin 
solutions  use  the  results  of  the  unconstrained  case  as 
initial  controls  (using  saturated  values  wherever  appropri¬ 
ated)  .  This,  however,  is  a  very  minor  modification  since 
the  solutions  for  the  unconstrained  case  is  very  easy  to 
compute . 


For  =  q2 

=  10 

u  .75 

V 

1.5 

* 

* 

* 

* 

X 

o 

u 

V 

X  (T) 

J 

1.0 

.10 

0.10 

1.00 

0.50 

2.5 

.25 

0.25 

2.50 

3.13 

5.0 

.50 

0.50 

5.00 

12.50 

7.5 

.75 

0.75 

7.50 

28.13 

8.0 

.75 

1.00 

10.00 

32.50 

9.0 

.75 

1.50 

15.00 

45.00 

10.0 

.75 

1.50 

16.00 

60.50 

11.0 

.75 

1.50 

17.00 

77.00 

12.0 

.75 

1.50 

18.00 

94.50 

15.0 

.75 

1.50 

21.00 

153.00 

20.0 

.75 

1.50 

26.00 

270.50 

Table  4. 

Effects 

of  Variation 

in  X- 

=  8  kft 

*^2  = 

10 

u  .75 

V  1.5 

* 

* 

If 

u 

V 

X  (T) 

J* 

1 

.75 

.98 

9.86 

12.25 

2 

.75 

-.98 

9.86 

14.50 

4 

.75 

.98 

9.86 

19.00 

6 

.75 

.98 

9.86 

23.50 

8 

.75 

.99 

9.89 

28.00 

10 

.75 

.99 

9.89 

32.50 

12 

.75 

.99 

9.90 

37.00 

14 

.74 

1.04 

10.37 

41.48 

16 

.71 

1.14 

11.43 

45.71 

18 

.69 

1.24 

12.41 

49.65 

20 

.67 

1.33 

13.33 

53.33 

Table  5 


Effects  of  variation  in  with 


8  kft 


•j 


1 

1.50 

0.34 

6.68 

22.33 

5 

1.46 

0.36 

7.26 

58.18 

10 

1.14 

0.57 

11.42 

91.43 

15 

0.94 

0.71 

14.12 

112.94 

20 

0.79 

0.75 

15.71 

127.86 

25 

0.66 

0.75 

16.69 

138.33 

30 

0.58 

0.75 

17.37 

146.04 

35 

0.51 

0.75 

17.90 

151.97 

40 

0.46 

0.75 

18.33 

156.66 

Table  9.  Solutions  when  the  pursuer's  capability 
exceeds  that  of  the  evader 


CHAPTER  4 


A  NONLINEAR  STOCHASTIC  PURSUIT  -  EVASION  PROBLEM 


The  most  natural  application  of  differential  game 
theory  probably  falls  on  a  class  of  problem  known  as  pursuit- 
evasion  where  two  or  more  adversaries  engage  in  a  combat  type 
mission.  The  state  of  the  art  of  this  problem  has  already 
been  discussed  in  Chapter  2  of  this  report. 

In  this  chapter,  a  model  for  nonlinear  stochastic 
pursuit-evasion  two-person  zero-svim  differential  game  will 
be  formulated.  The  problem  will  then  be  solved  using  the 
simple  algorithm  developed  in  Chapter  2  for  a  set  of  desig¬ 
nated  parameters.  Lastly,  many  aspects  of  the  computational 
results  will  be  compared  to  those  obtained  by  McFarland. 

Several  important  features  of  two  person  zero-sum  dif¬ 
ferential  g£unes  will  be  illustrated  by  the  problem  studied 
in  this  chapter.  The  dynamics  of  the  problem  are  nonlinear 
using  the  set  of  sufficient  statistic  of  the  actual  physical 
entities.  In  this  manner  the  elements  in  the  set  of  suffi¬ 
cient  statistics  can  be  treated  as  the  state  variables  of  a 
deterministic  problem  and  hence  reduce  the  complexities  of 
the  stochastic  problem  greatly.  Moreover,  the  values  of  the 
cost  function  for  the  minmax  and  the  maxmin  solutions  of 
this  problem  are  not  the  same.  Thus,  we  are  presented  with 

a  realistic  problem  whose  solutions  are  not  "saddlepoint" 

« 

and  hence  substantiating  the  fact  that  saddlepoint  does  not 


have  to  exist  in  a  general  differential  game.  This  fact 
also  serves  to  strengthen  the  two  examples  presented  in 
Chapter  2.  The  cost  function  of  the  problem  utilizes  the 
probability  of  survival  as  a  probabilistic  measure  and  pro¬ 
vides  a  realistic  flavor  of  a  stochastic  differential  game. 
Furthurmore,  the  information  sets  available  to  each  player 
are  limited  on  only  those  state  variables  observable  by 
each  player. 

4.1.  Description  of  the  Problem 

A  simplification  of  the  missle-anti-missle  intercept 
problem  will  be  studied  in  this  chapter.  An  incoming  attack 
ing  missle,  maneuverable  laterally,  is  trying  to  avoid  being 
intercepted  by  an  antimissle,  also  maneuverable  laterally. 
The  attacking  missle,  however,  is  also  charged  with  the  task 
of  trying  to  destroy  an  isolated  target  (a  military  install¬ 
ation,  an  industrial  complex,  or  any  other  strategic  target) 
and  thus  cannot  stray  too  far  away  from  a  designated  path. 

On  the  other  hand,  the  antimissle  which  is  trying  to  defend 
the  target  is  launched  from  an  area  on  or  near  the  target. 

A  ground  support  radar  will  keep  track  of  the  position  of 
the  oncoming  attacking  missle  and  hence  the  defender  will 
have  a  full  set  of  informations  on  both  his  own  and  the 
enemy  positions.  The  target  defender  will  make  use  of  these 
informations  and  try  to  minimize  the  distance  of  closest 
approach  between  it  and  the  intruder.  If  it  gets  close 
enough,  the  attacking  missle  is  neutralized  or  captured 
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Only  one  pass  is  allowed  for  this  problem  because  once 
the  missies  pass  one  another,  the  antimissle  will  not  be 
able  to  turn  around  and  try  to  catch  the  attacking  missle. 

The  control  center  of  the  attacking  missle  will  be  too 
far  away  to  observe  the  actual  positions  of  both  missies 
by  radar.  However,  with  the  present  technology,  it  is  not 
hard  to  visualize  an  attacking  missle  with  an  on  board 
computing  capability  to  compute  its  own  displacement  from  a 
designated  path.  Therefore,  the  attacking  missle  will  only 
be  able  to  make  use  of  the  information  on  his  own  position. 
The  attacking  missle  is  deemed  to  score  or  accomplish  its 
mission  if  it  manages  to  avoid  interception  and  yet  reach 
the  target  zone. 

For  this  problem,  we  shall  call  the  evader  player  U 
and  the  interceptor  player  V. 

To  make  the  problem  tractable,  simplification  assump¬ 
tions  will  be  made,  nevertheless  significant  features  of 
the  general  problem  will  be  maintained.  The  simplified 
version  of  the  intercept  problem  is  illustrated  in  Figure  6. 
One  simplified  assumption  is  that  planar  motion  is  assumed. 
This  is  equivalent  to  a  classical  aeriel  combat  encounter 
over  a  flat  earth. 

The  mean  initial  line  of  sight  (LOS)  between  the  two 
players  has  the  length  L.  This  line  will  be  used  as  a  basic 
reference  line  for  the  problem.  The  initial  position 
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Figure  6.  Schematic  of  Stochastic  Pursuit'-Evasion  Problem 


dispersion  along  the  reference  line  L  is  much  less  import¬ 
ant  to  the  problem  than  the  lateral  position  dispersion, 
since  the  variation  in  L  only  effects  the  variation  in  the 
"interception  time  T"  and  the 'target  engagement  time  Tg". 

The  interception  time  T  is  defined  as  the  time  when  both 
players  reach  the  locus  of  distance  of  closest  approach. 

The  target  engagement  time  Tg  is  defined  as  the  time  when 
player  U  reach  the  line  extended  from  the  target  perpendi¬ 
cular  to  the  line  of  sight  L.  Thus  T  and  Tg  may  be  regarded 
as  fixed. 

Both  players ’■ initial  velocities  are  assumed  parallel 
to  the  line  L.  The  lateral  maneuvering  of  each  player  is 
assumed  uncoupled  to  the  forward  motion.  This  assumption 
is  not  a  serious  determent  to  the  realism  of  the  intercept 
problem  since  the  lateral  displacement  will  typically  not 
exceed  5%  of  L  on  either  side  of  the  mean  initial  LOS. 

The  initial  lateral  position  of  each  player  is  assumed 
a  random  variable  normally  distributed  about  L  with  the 
mean  equal  to  zero.  The  uncertainty  in  the  lateral  position 
of  the  attacking  missle  arises  from  accumulated  error  picked 
up  during  the  launch  and  midcourse  preengagement  phases  of 
ICBM  flight  whereas  the  uncertainty  in  the  lateral  position 
of  the  defending  antimissle  is  dued  to  the  inaccuracy  in 
controlling  the  violent  acceleration  subjected  during  the 
launch  phase  which  is  assumed  prior  to  commencement  of  this 
problem. 
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The  capture  zone  is  a  measure  of  potency  of  the 
interceptor.  The  width  of  this  zone  depends  upon  the 
characteristic  of  the  proximity  fuse  used  in  the  Warhead 
of  the  interceptor.  The  scoring  zone  r.  is  a  measure  of 
both  the  potentcy  of  the  attacking  ICBM  and  the  vulnerabi¬ 
lity  of  the  target.  The  width  of  r_  depends  upon  the  ex- 
plosive  characteristic  of  the  warhead  of  the  ICBM  and  the 
vulnerability  of  the  target  being  attacked.  The  target  is 
considered  destroyed  if  the  ICBM  can  get  within  the  scoring 
zone  at  the  time  Ts-  Normally  will  be  much  smaller  than 


4.2  Formulation  of  the  Problem 

The  missle  antimissle  problem  described  in  the  above 
section  will  be  formulated  as  a  nonlinear  stochastic  dif¬ 
ferential  game  as  follows: 

4.2.1  Dynamics  of  the  Problem 

The  parameters  that  are  important  to  both 
players  are  their  respective  lateral  positions  normal  to 
the  line  L.  As  mentioned  before,  the  lateral  maneuvering 
is  assumed  uncoupled  to  the  forward  motion,  and  the  players 
are  assumed  able  to  maneuver  laterally  by  controlling 
their  lateral  velocities: 


where  c^(t)  and  c^d)  are  instataneous  lateral  velocities 

controlled  by  U  and  V  respectively.  The  initial  conditions 

x^Q  and  XyQ  are  random  variables  are  normally  distributed 

0  2 

with  zero  mean  and  the  covariances  (0)  and  (0). 

These  probability  density  functions  are  shown  by  equation 
(4.2). 


ft^uo^  =  _ ^  ^"^uo  ^u  . 

e  (0) 

u 

^<*vo^  =  1  exp  t-xj  /2  O'?  (0)1 . (4.2b 

er^(O)  A/2ir 

The  lateral  velocities  c^(t)  and  c^(t)  are  then  functions 
of  random  processes  x„(t)  and  x  (t) .  These  velocities  are 
limited  to  within  10%  of  their  associated  average  forward 
speeds  to  validate  the  uncoupled  assumption. 

The  vector  x^(t)  =  [x^(t)  ;  x^(t)]  is  assumed  to  be 

a  Gauss  Markov  process  where  only  two  statistics,  a  mean 
and  a  convariance,  are  needed  to  specify  it  completely.  To 
make  this  assumption  valid,  the  system  that  generates  the 
process  must  be  linear.  Thus  we  are  required  to  choose 
Cu(t)  and  Cy(t)  as  linear  functions  of  x^(t)  and  x^(t). 

For  the  interceptor,  player  V,  the  important  quantity 
that  will  have  to  be  minimized  is  the  lateral  distance 
between  him  and  the  attacking  missle  U  at  the  interception 
time  X  (T)  -  Xy(T).  However,  at  any  particular  time  before 
the  interception  time  t  <  T,  x  (T)  -  x,  (T)  is  not  available 


to  V.  Therefore,  V  has  no  choice  but  to  use  the  most  recent 
corresponding  information  that  is  the  best  indicator  for 
Xy(T)  -  Xy(T) ,  namely  x^(t)  -  x^(t) .  Hence  c^(t)  is  defined 
as 

c^(t)  =  v(t)rxy(t)  -  x^(t)]  .  (4.3) 

where  v{t)  =  feedback  gain  function,  to  be  found 

as  V' s  control 

x^(t),  x^(t)  =  current  states  of  each  players 

For  the  attacking  missle,  player  U,  the  most  important 

measure  for  him  is  the  distance  by  which  he  misses  the 

target,  x  (T  ),  at  the  time  when  he  crosses  the  target 
u  S 

boundary.  Between  the  interception  time  T  and  the  target 
engagement  time  Tg,  U  is  not  at  all  effected  by  any  action 
on  Vs  part  during  this  interval.  Therefore,  the  problem 
in  this  interval  is  an  optimal  control  problem  with  only 
one  player  U  starting  from  an  initial  state  x^(T)  and 
minimizing  the  final  state  Xy(Tg)  using  "reasonable”  control 
along  the  way.  The  problem  in  this  duration  can  then  be 
solved  as  a  linear  quadratic  problem  with  the  result 

Xu(Tg)  =  k  Xy(T),  o<k<l  .  (4.4) 

where  the  fraction  k  depends  upon  the  time  duration  Tg  -  T 
and  the  weight  on  the  control  u(t) .  For  this  differential 
game  then,  we  shall  assume  that  U  can  reduce  x^(T)  by  a 
given  fraction  k  during  the  interval  [T,Tg].  Ideally  U 
would  like  to  have  his  feedback  function  as  a  function  of 
Xu{Ts)  (since  he  cannot  observe  the  state  x^(t)  at  any  time) 


Hence 


1 

with 


C  (t)  =  u(t)  X  (T  )  .  (4.5) 

11  U  o 

u{t)  =  feedback  gain  function,  to  be  found  as 
U’s  control 


using  (4.4)  in  (4.5) 

C^(t)  =  u(t)  k  Xy(T) 


(4.6) 


again  it  is  obvious  that  U  do  not  have  x^(T)  at  any  time 
prior  to  the  time  of  interception  equation  (4.6)  is  not 
causal.  Therefore,  the  best  he  could  do  is  to  use  the  most 
current  information  x^(t)  instead  of  it.  Hence  we  have 


C^(t)  =  u(t)  k  Xy(t)  .  (4.7) 

Substitue  (4.3)  and  (4.7)  into  equations  (4.1)  we  have 
for  0  <  t  <  T 

x^(t)  =  [kx^(t)I  u(t);  x^(0)  =  x^^  . (4.8a) 

x^(t)  =  (x^(t)  -  x^(t)]  v(t);  x^(0)=x^Q . (4.8b) 

in  matrix  form 


X 

=  F 

X  ;  x(0) 

*  5o 

where 

’_^u 

ft*  V 

k  u(t)  1 
_ — — 

0 

X 

,  F  = 

v(t)  j 

-v(t) 

(4.9) 


! 

Note  that  we  have  arrived  at  the  same  equation  as 
McFarland.  However,  different  rationalizations  have  been 
used.  The  reason  for  the  difference  is  because  it  is  felt 
that  the  assumption  c^(t)  and  c^(t)  equal  to  zero  in  the 
interval  [t,  T]  used  by  McFarland  later  becomes  a  conflict 
with  the  actual  values  of  Cjj(t)  and  C^(t)  in  the  computation. 
With  the  above  rationalizations,  however,  no  such  assumption 
has  to  be  made. 

Note  also  that  equation  (4.9)  is  linear  since  neither 
u(t)  nor  v(t)  is  effected  by  the  actual  value  of  the  random 
variables  x^(t)  and  x^(t).  Therefore,  we  can  say  that  u(t) 
and  v(t)  are  not  functions  of  x  (t)  and/or  x„(t) .  Since  x_ 

U  V  — O 

is  Gaussian  given  by  equation  (4.2)  and  x(t)  is  generated 
by  a  linear  process  (4.9),  the  vector  x(t)  will  remain 
Gaussian. 

We  now  proceed  to  derive  dynamic  equations  for  the  set 
of  sufficient  statistics  of  x(t) .  Since  x(t)  has  dimension 
2,  the  mean  vector  will  be  of  dimension  2,  and  the  covari¬ 
ance  matrix  will  contain  3  independent  elements.  Normally 
then,  the  dynamics  of  this  problem  should  consist  of  5 
equations.  However,  it  is  easy  to  see  that  the  mean  vector 
is  zero: 

E  [x(t)]  =0  V  t£  10,t1  .  (4.10) 

since  the  initial  value  is  zero  as  shown  in  equation  (4.2). 
The  covariance  matrix  is  defined  as 
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3 


E  [x(t)x'(t))  -  X  (t)  ■= 


[  x„(t)x^(t)|  J 


(4.11) 


where 
under  it. 


designates  the  expected  value  of  the  quantity 


Now,  if  x(t)  satisfy  the  usual  Lipshitz  condition 


then 


_  T 

X  (t)  =  d  E(x(t)x^(t)l  =  Ed  [x(t)x  (t)] 


=  E  [x(t)x'^(t)+x(t)x^(t)  ] 


=  FX+  X  . (4.12) 


Equation  (4.12)  can  be  expressed  in  components  as: 


d  (x^)  =  k  u(t)  x^At) 
3-4.  u  u 


*U 


(0) 

u 


(4.13a) 


d  (x  X  )  =  [ku(t)-v(t)l  X  X  +v(t)  x  (t) ;  x  (0)x  (0)  =  0 

gf^uv  uv 

.  . . (4.13b) 

d  (xf)  =  -2v(t)  x^(t)+2v(t)x„(t)x„(t) ;  x^(0)  =  Crf  (o)  ..  (4.13c) 

V  V  U  V  V  V 


Note  that  equations  (4.13)  are  nonlinear  since  the 

controls  u(t)  and  v(t)  are  indeed  effected  by  the  value  of 

the  covariances  of  the  state  vector  and  the  products  of 

control  and  state  variables  appear. 

2  '  ~  2 

We  can  call  x*,  x^x^,  and  x‘  state  variables  and  use 
equations  (4.13)  directly  as  the  dynamics  or  the  state 


equations  of  the  problem.  However ^  as  we  shall  see  later, 
it  is  more  convenient  and  more  meaningful  to  use  the  pro’- 
jected  intercept  and  the  target  miss  as  the  state  variables. 
These  variables  are  defined  as 


where 


then 


Xj(t)  =  "  3Cy(t)  . . .  (4.14a) 

x^(t)  ^  kx^(t)  .  (4.14b) 


Xj(t)  =  current  value  of  target  miss 

x^(t)  =  current  value  of  projected  intercept 


x_(t)  =  u(t)  X  (t)  -  v(t)  x-(t)  (4.15a) 

I  T 

x^(t)  =  k  u(t)  x^(t)  (4.15b) 


Again,  it  is  easy  to  see  from  (4.14)  that  the  mean  values 
of  Xj(t)  and  x^(t)  remains  zero  throughout  the  interval 
[o,T] .  Define  the  covariance  matrix  as 


r  ”2 

X 

_ I. 


I  T  ' 


Xip 


X2  J 
(4.16) 


and  again  by  the  process  similar  to  equation  (4.12)  we  can 
show  that  the  elements  of  the  covariance  matrix  satisfy  the 
following  differential  equations: 

x^(t)  =  -2v(t)Xj,(t)  +  2u(t)Xj^(t)  ;  x^(0)  =  ff^(0)+0'J(0) 

r  2  (4.17a) 

x,(t)=  |ku(t)-v(t)J.  x(t)+u(t)x,(t)  ;  x,(0)  =  k  C  (0) 

^  2  1  u  (4.17b) 
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Equations  (4.17)  generates  a  set  of  sufficient  statis¬ 
tics  for  the  problem  since  Xj(t)  and  x(t)  are  zero  through¬ 
out  the  interval  (0,  Tj .  These  equations  then  will  serve 
as  dynamics  or  state  equations  for  our  problem.  A  success¬ 
ful  use  of  these  equations  in  solving  the  problem  will 
demonstrate  that  a  stochastic  differential  game  can  be 
treated  as  a  deterministic  game  if  a  set  of  sufficient 
statistics  can  be  found  and  used  as  state  variables  in  the 
modelling  of  the  state  equations. 

4.2.2  Cost  Function  of  the  Problem 

In  order  to  find  the  "best”  controls  for  each  player, 
some  criteria  will  have  to  be  established  to  discriminate 
one  control  from  another.  Since  one  of  our  goal  is  to  try 
to  be  realistic  as  possible,  and  since  this  is  a  stochastic 
'problem,  the  probability  of  survival  of  the  target  seems  to 
be  the  ultimate  criteria.  The  attacking  missle,  wanting 
to  destroy  the  target,  will  try  to  minimize  the  probability 
of  survival  of  the  target;  whereas  the  interceptor,  defend¬ 
ing  the  target,  will  try  to  maximize  the  probability  of  sur¬ 
vival  of  the  target.  We  shall  now  attempt  to  find  the 
probability  of  survival  as  a  function  of  the  state  variables. 

P (survival)  =  1-  P(not  captured  and  score)  . .  (4 . 18) 


=  1—  P (score/not  captured) P  (not  captured) 

The  last  step  follows  from  the  Baysean's  Law  of  conditional 


probability.  Using  the  definition  of  the  scoring  zone. 


P (score/not  captured, x  (T) )= 

T 


for  x^jCTX  rg 

otherwise  . (4.19) 


Now 

P( score,  x^(T)/not  captured)  =  P (score/x^ (T) ,  not  captured) 

•  P(x^(T)/not  captured) 


Therefore 

P (score/not  captured) 


f 

X 

T 


(5  )as 

(T)/not  captured 


(4.20) 


where  the  conditional  probability  density  function  is  defined 


as 

^x  (1)  -  ^  gyp  f- €  ^/2x- (T)l 

^(T)/not  captured  VTttxJTtT 


substitue  this  into  (4.20)  and  use  the  symmetric  property 
of  the  normal  probability  density  function 


P (score/not  captured)  =  2 


where 


erf 


£ 


exp  [-  S  /2x2(T^d§ 


®  ‘/2  TTx^  (T) 


=  erf 


(4.21) 


■s//2x2(T)  2 

exp(-J^)d'J 


(4.22) 


The  error  function  (erf)  is  a  Fortran  built-in  function 
and  can  be  called  directly  from  the  computer  using  Fortran 


language.  Now 


P(not  captured)  =  1  -  P (captured) 

Using  the  definition  of  the  capture  zone 


P  (captured/Xj (T) )  = 


{1  if  -r  <  X  (T)<  r 

c  I  c 

0  otherwise 


(4.23) 


Thus 


P(not  captured)  =  1  -J^ (T)  . (4.24) 


since  x  (T)  is  Gaussian,  we  can  show  that 
I  , 

■  X 

P(not  captured)  =  1  -erf  ^ 
substitue  (4.2])  and  (4. 25)  back  into  (4.18)  we  obtain 


(4.25) 


0  IXq(T),  X  (T)]  «  P (survival) 


r  1  r  1-erf  F 

y2x2(T)J| 


(4.26) 


By  definition,  both  ^^(T)  and  X2(T)  must  be  positive. 

From  (4.26)  when  x  (T)  =  0  then  P (survival)  =  1.  This 

o 

checks  out  with  the  fact  that  perfect  interception  ensures 
certain  survival.  For  combinations  of  low  X2(T)  and  high 
Xo(T)  the  probability  of  survival  approaches  zero,  this 
again  checks  out  with  the  condition  when  the  attacking 
missle  successfully  evaded  the  interceptor  and  yet  manages 
to  reach  the  target. 

In  order  to  realize  a  reasonable  and  realistic  controls 
for  the  problem,  and  integral  penalty  function  must  be  added 
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to  the  cost  function.  The  most  direct  method  is  to  penalize 
the  squared  values  of  the  controls  u(t)  and  vtt)  with  proper 
weights.  Using  this  approach,  we  define 


A  I  ^  2  0 

I(u,v)  (Q^u  (t)  -  Q2v'^(t)]dt  . (4.27) 

where  and  Q2  are  positive  quantities  representing  appro¬ 
priate  penalizing  weights  on  controls  u(t)  and  v{t)  respect¬ 
ively.  The  actual  choice  of  and  Q2  will  be  discussed 
later.  The  penalizing  term  for  U  is  positive  because  U  is 
trying  to  minimize  the  cost  function.  This  term  will 
restrict  U  from  using  "too  large"  control.  Player  V  has 
a  negative  penalizing  term  because  he  is  trying  to  maximize 
the  cost  function.  Too  large  v(t)  in  any  interval  of  time 
could  results  in  a  negative  value  for  I(u,v).  The  composite 
cost  functional  for  this  problem  will  then  be: 

J(u,v)  =  0  (x^(T),  X2(T)]  +  I(u,v)  .  (4.28) 

4.2.3  Constraints 

This  pursuit-evasion  problem  has  been  formulated  in 
such  a  way  that  the  subsequent  effective  lateral  vel'  .ities 
will  not  exceed  10%  of  the  associated  average  forward 
speeds.  Thus  the  assvimption  that  the  forward  motion  is 
uncoupled  from  the  lateral  motion  can  be  used  throughout 
this  chapter.  In  addition,  hard  constraints  are  put  on 
u(t)  and  v(t)  as  follows: 

lu(t)|4  2,  |v(t)|«l  Y  t  C  to,Tgl  . (4.29) 


These  constraints  have  an  equivalent  effects  of  lijmit- 
ing  the  lateral  accelerations  of  the  missies.  The  larger 
limiting  factor  for  U  as  compared  to  V  is  used  in  order  to 
be  consistent  with  other  parameters  which  will  be  discussed 
in  the  section  on  the  computational  aspects  of  the  problem. 
4 . 3  Convergence  Control  Technique 


Before  we  embark  on  other  computational  aspects  of  the 
problems,  it  is  well  to  note  here  that  the  DDP  algorithm 
will  not  converge  for  this  problem  without  the  use  of  some 
kind  of  convergence  control  scheme.  McFarland  need  the 
"step-size"  method  developed  by  Jacobson  and  Mayne  in  solv¬ 
ing  his  problem  with  good  result.  The  so  called  "step-size 
method  when  used  with  the  first-order  algorithm  developed 
in  this  report,  however,  did  not  warrant  convergence  for 
this  pursuit-evasion  problem. 

Obviously,  some  other  convergence  control  scheme  is 
required.  One  such  scheme  which  has  demonstrated  good 
convergence  property  for  a  large  varieties  of  problems  was 
developed  by  Jarmark  in  1975.  Anderson has  used  this 
scheme  to  derive  feedback  control  for  pursuing  spacecraft 
with  excellent  results.  After  the  "step-size"  method 
failed  to  provide  convergence  for  the  solution  of  this 
problem,  several  other  convergence  control  schemes  were 
tried.  It  was  finally  decided  that  Jarmark’ s  scheme  was 
the  most  suitable  for  this  problem.  This  scheme  will  be 
briefly  described  in  the  rest  of  this  section.  More 


detailed  discussion  and  proof  can  be  found  in  references  (32) 
to  (34) . 

The  reason  why  the  actual  cost  change  deviates  too  much 
from  the  predicted  cost  change  is  the  violation  of  the 
assiimption  that  d  x  is  small  in  the  derivation  of  the  DDP 
equations  and  the  higher  order  terms  in  equation  (2.33)  can¬ 
not  be  neglected.  Since  the  DDP  algorithm  worked  out  in 
chapter  2  deals  with  the  inner  minimization,  we  shall  also 
deal  exclusively  only  with  DDP  minimization  here.  However, 
it  is  clear  that  the  same  technique  can  be  used  with  inner 
maximization  for  the  minmax  case  also  with  only  a  few  minor 
adjustments. 

(34) 

Jarmark  has  shown  that  the  magnitude  of  Ax(t)  can 
be  restricted  and  the  Taylor's  series  expansion  equation 
(2.34)  can  be  made  valid  by  adding  a  penalty  term  to  the 
integral  of  the  cost  functional  equation  (2.20).  Thus 
equation  (2.20)  can  be  rewritten  as  follows: 

J(u(t))  =/  (x(T),T)  +J  [L{x,u,t)+  Au(t)'^W  Au(t)  )dt 

.  (4.30) 

Then  Jarmark  proceeds  to  show  by  using  Theorems  and 
Lemmas  that: 

1.  A  u  in  each  iteration  as  measured  by  the  metric 

d(u^,u^“^)  =  I  . (4.31) 

can  be  made  arbitrarily  small  by  the  choice  of  the  weight¬ 
ing  matrix  W. 


Ill 


2.  There  exists  a  W  such  that  the  series  expansion 
equation  (2.33)  and  (2.34)  is  valid. 

3.  For  W  ^  0»  a  reduction  in  cost  at  each  iteration 
is  obtained  if  Au(t)  /  0  for  some  t  €  [O/T]  . 

4.  The  solution  of  the  artificial  cost  in  equation 
(4.30)  converges  to  the  same  solution  of  the  original  cost 
equation  (2.20). 

These  are  existence  Theorems,  and  so  far  there  is  no 
hard  and  fast  rule  on  how  to  choose  W.  If  the  element  of  W 
is  too  large  the  convergence  will  be  slow.  On  the  other 
hand,  if  the  element  of  W  is  too  small,  the  assumption  Ax 
is  small  may  not  be  valid.  Jarmark  suggests  the  following 
procedure . 

For  a  starting  value  choose  a  W  base  on  prior  experi¬ 
ence  on  the  same  type  of  problems.  The  structure  of  the 
problem  could  be  used,  for  example,  the  elements  of  W  should 
be  small  when  the  problem  is  close  to  a  linear  problem. 
After  each  iteration,  the  stopping  rule  will  have  to  be 
changed  from  (a(0)|<  £  to 

|a(0)|  <  6  . (4.31) 

1  +  B  Wl 

If  the  stopping  rule  is  not  satisfied  then  use  the  conver¬ 
gence  index  domain  shown  in  Figure  7  to  adjust  the  element 


=  a^(0) 


Figure  7.  Convergence  index  domain 


Area  I:  A  <  0  ,  do  not  except  the  iteration,  increase 
3^(0) 

the  component  2-5  times. 

Area  II:  the  element  of  W  can  be  adjusted  by  the 
following  formula 

-  (1  -  Aj^/(a^(0)s))  (H^  +  w*) 


(4.33) 


Area  III;  use  approximately  the  same  value  of  in  the 

last  iteration,  may  be  increased  or  decreased 

slightly  in  this  situation.  Increase  if  close  to 
^  axis,  and  decrease  otherwise. 

This  procedure  is  used  with  very  good  convergence 
property  for  the  present  problem. 

4 . 4  Computational  Aspects  of  the  Problem 

We  now  have  the  problem  in  which  the  attacking  missle 
U  has  to  find  u(t)  to  minimize  the  maximum  possible  cost  and 
the  intercepting  missle  V  has  to  find  v(t)  to  maximize  the 
minimum  possible  cost.  The  cost  functional  and  the  dynamics 
of  the  problem  as  developed  in  section  4 . 2  may  be  written  as 
follows : 

Cost  Functional 


J(u,v)  =  1-erf  r  1 

/  1-erf  f  ll 

LV2x2(T)  J 

[  1  V2Xo(T)JJ 

+  [Q^u^Ct)  -  Q2v2{t)]dt  . (4.34) 


Jo 

Dynamics 


• 

*o  = 

-2vXq 

+UXj^ 

• 

r 

II 

o 

0 

X 

2  2 

0  (0)  +  O'  (0) 

U  V 

. (4.35) 

II 

[ku  - 

v]  Xj^+ux2 

• 

f 

II 

O 

X 

k  0^  (0)  . (4.35b) 

u 

X  • 

II 

2kux2 

• 

9 

X2(0)  = 

2  2 

k*^  O'  (0) . (4.35c) 

u 

These  state  equations  are  valid  for  t  C  [0  ,T] 
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Constraints 

h  (t)|  4  2,  |vtt)|  <1  for  O^t^T  . (4,36) 

4.4.1  Parameter  Value  Assignment 

The  missies  engagement  distance  is  assumed  typically 
around  35  miles  or  L  =  180  kilofeet.  The  kilofeet  unit  is 
used  here  because  it  is  more  convenient  and  more  widely  used 
unit  for  this  type  of  problem.  As  mentioned  before  the 
forward  motion  is  uncoupled  to  the  lateral  motion.  There¬ 
fore,  only  the  average  forward  speeds  for  the  players  rather 
than  the  instantaneous  forward  speeds  in  the  whole  engage¬ 
ment  time  interval  are  needed.  Typical  forward  speed  for 
the  attacking  missle  is  =  15  kilofeet/second  while  the 
intercepting  missle  is  typically  slower  at  =  7.5  kilofeet 
per  second. 

At  these  average  speed,  the  players  will  cross  the 
line  of  interception  at  the  time: 

T  =  L  =8  seconds 


The  attacking  missle,  if  escaped  from  the  interceptor,  will 
cross  the  scoring  boundary  at  the  time: 

T_  =  L  =12  seconds 
®  S 

u 

Using  these  average  forward  speeds,  the  distance  between  the 
line  of  interception  and  the  target  is: 

Lj  =  SyT  =  60  kilofeet 
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The  initial  lateral  dispersions  from  the  line  L  are 
normally  distributed  with  the  mean  zero  for  both  players. 


The  initial  lateral  position  of  the  attacking  missle  is 

simply  more  dispersed  because  it  involved  more  distance 

and  time  covered  before  commencement  of  the  differential 

game.  The  initial  standard  deviation  for  U's  lateral 

position  is  O'  (0)  =  3  kilofeet  while  that  of  V  is 
u 

0  (0)  =  0.5  kilofeet. 

v 

The  fact  that  the  scoring  zone  is  larger  than  the 

capture  zone  should  be  clear  and  has  been  explained  in  the 

description  of  the  problem.  We  shall  assume  r  =0.5  kilo- 

s 

feet  and  r^  =  0.25  kilofeet  for  the  purpose  of  this  study. 

A  typical  way  of  selecting  the  weights  Qj  and  Q2  for 
the  penalty  functions  of  the  controls  is  to  use 


=  T  X 

maximum 

value 

-  2 

of  u  = 

32 

5=  T  X 

maximum 

value 

,  2 

of  V  = 

8 

Experimentation  around  these  values  gives  =  0.0625 
and  Q2  =  0.125  for  the  best  results  in  this  study.  McFarland 
also  used  these  values  in  his  report.  Between  the  time  of 
interception  and  the  target  boundary  crossing  time,  we  have 
shown  that  U  can  cut  his  lateral  dispersion  down  by  a  fixed 
fraction  k  depending  upon  the  other  parameter  values.  We 
shall  assume  k  =  0.5  for  this  report. 

In  summary,  we  shall  use: 

T  =  8  sec  0^(0)  =  3  kft  rg  =  0.5  kft  =  0.0625 


L6 


k  =0.5  (0)  =  0.5  kft  =  0.25  kft  Q,  =  0.125 

y  w  A 

4.4.2  Maxmin  Solution 

We  shall  follow  the  algorithm  steps  covered  in  section 
2.5.  To  start  off  the  algorithm  for  the  maxmin  control  a 
nonimal  control  VQ(t)  can  be  approximated  by  maximizing  the 
cost  functional  equation  (4.34)  subject  to  the  dynamic 
equations  (4.35)  with  control  u(t)  =  0  Vt  €  t0,T].  Using 
this  nominal  control  for  V  will  have  the  effect  of  forcing 
U  to  do  "something"  in  order  to  minimize  the  cost  functional. 

With  u(t)  =  0,  the  state  equations  (4.35)  become 


Xo  =  -  2vXq  ;  x^(0)  =  cr^(0)+  0^(0) .......  (4.37a) 

x^  =  -  vx^  ;  X2^(C)  =  k  ^^(0)  (4.37b) 

•  2  2 

X2  =  2kux2  ;  X2(0)  =  k  ^y(O)  (4.37c) 

With  these  state  equations,  the  Hamiltonian  is: 

H  =  Qj^u^  -  Q2V^  -  2vXq  Xq  -  vXjL  . (4.38) 


where  X's  are  costate  variables  expressed  by  the  following 
differential  equations: 


Xo  - 

- 

■^0 

=  XV  Xq 

s  4> 
a  Xq  (t) 

- (4.39a) 

II 

-  3H 
a  XI 

=  vX^ 

;  Xi(T)  = 

0 

- (4.39b) 

- 

-  lii 

^  X2 

=  0 

;  X2(t)  = 

c)  ^ 

S  X2 (T) 

•  •  • • C4»39c} 
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Assuming  for  the  moment  that  |v(t)l  <  1  for  0  ^  t  ^  T, 


the  optimality  condition  is: 


3v 


-2Q2V  - 


0 


Hence 

v(t)  =  -(2Xq  Xq  +  X3^)/2Q2  . (4.40) 

Differentiating  (4.40) 

v(t)  =  -(2XqXq  +  2XqX^  +  x^X^  +  xkj^)/2Q2..(4.41) 
Using  (4.37)  and  (4.39)  in  (4.41)  we  have 


v(t)  =  -(-2vXqX^  +  2vXq  Xq  -  VX2^Xj^+vx^X^)/2Q2  =  0  (4.42) 

Therefore,  v(t)  =  constant  (4.43) 

Now,  equations  (4.37)  and  (4.39)  can  be  solved  analytically 
with  the  following  results: 

=  XQ(0)e“2vt  ^  ^^(t)  =  X^,(T) 

Xj^(t)  =  Xj^(0)e"''^*^  ,  Xj^(t)  =0 

X2(t)  =  X2(0)  ,  ^2^^^  = 

.  (4.44) 

Substitue  (4.44)  back  into  (4.40) 

v(t)  =  -2x_(0)  X^(T)e"^'^^  (4.45) 

u  O 


and  from  (4.39a)  we  have 


Use  x^(T)  -  e  |^<)'^(0)+  e'^(O)]  in  (4.46)  and  substituting 
back  into  (4.45)  yeilds 

. exp  [ vT-r^exp  ( 2vT) /2  (ffj(0)  +  a^l0))] 
c  u  V 

r  r  -j 

.erf  I  s 

LVJ  k  tf^(0)J  . (4.47) 

Notice  that  equation  (4.47)  is  transcendental,  it  must  be 
solved  numerically.  With  the  parameters  given  in  section 
(4.4.1),  we  found  that  v  as  .25.  Therefore,  we  shall  use 
v^(t)  =  .25  for  0  4  t  ^  T  as  the  starting  nominal  maxmin 

f 

control  for  our  algorithm. 

The  next  step  is  to  find  all  the  local  minima  for 
J(u,v^).  Two  local  minimizing  are  found  by 

repeated  applications  of  DDP  routine  for  different  values  of 
starting  u(t) .  Table  10  summarizes  the  numerical  results 
for  the  maxmin  iterations  using  v^(t)  =  .25,  Uq^^^  (t)  =  -.25, 
and  Ujj^^Mt)  =  .25.  The  two  starting  values  of  u(t)  lead  to 
two  different  minima.  Extensive  preliminary  testing  shows 
that  only  the’se  two  local  minima  exist  for  this  problem. 


The  pertinent  equations  for  the  applications  of  DDP 
routine  are  as  follows: 

H  =  Qj^u^  -  +  2  (uXj^-vXq)  +  ( (ku-v)xj^+ux2) 

+  2kuX2Jjj2  . (4.48) 

u*(t)  =  -(2Xj^Jx^+(kXj^+X2) J^^+2kX2Jjj2“^'^“o^/2(Qj^+W)  ....(4.49) 

*  * 

If  |u  (t)I  >  2,  then  set  the  corresponding  |u  (t)|  =  2  since 

H  is  convex  with  respect  to  u.  If  u*(t)  is  not  on  the  bound¬ 
ary  then  equations  (2.49)  become 


a(t)  =  Qj^(u*(t)  -  UQ(t))2  .  (4.50a) 

i  (t)=  2Vo(t)  (t)  .  (4.50b) 

Xq  *o 

(t)*  -2u*(t)J  (t)  -  (ku*(t)-v_(t) ) J  (t) . (4.50c) 

*1  ox 

j  {t)=  -u*(t)  [J  (t)  +  2k  J  (t)]  . (4.50d) 

X2  X2 

with  the  terminal  conditions 

a(T)  =  0  . (4.51a) 


Jx  (T)  =  -rgexp(-r^/2x^(T)erf  (rg/'/lxJTr) )  /  VzTI  x^  (T) 

. (4.51b) 

J  (T)  =  0  (4.51c) 

*1 

Jx2tT)  =  rg  exp(-rg/2x2(T)) .  1 1-erf (r^/  V2x^(T)) 


2 


Table  10.  Maxmin  Iteration 
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124,  STEP  =0.58  (g,  ,  g,  )  =  .0094 


MINCOST  CROSSOVER 


The  time  T  =  8  seconds  is  divided  into  64  intervals  and 


equation  (4.49)  is  used  before  each  step  of  integrating 
equations  (4.50)  backward,  the  value  of  u  (t)  for  each  step 
is  also  stored  in  the  memory  to  be  used  either  as  the  mini¬ 
mizing  control  or  as  the  nominal  control  u  (t)  for  the  next 

o 

iteration.  Equation  (4,50a)  is  used  in  case  u  (t)  is  not  on 

* 

the  boundary.  If  u  (t)  is  on  the  boundary,  however,  the 
following  equation  must  be  used 

-a(t)  =  H  (x,u*,  J^,t)  -  H(x,u^,Jjj,t)  . (4.52) 

with  the  same  terminal  condition  equation  (4.51a). 

It  must  be  noted  that  the  simple  Euler  integration 

scheme  used  very  effectively  in  the  last  chapter  is  not  an 

adequate  integration  scheme  for  both  the  state  and  the  DDP 

equations.  More  accurate  integration  scheme  was  needed,  one 

such  scheme  is  the  Runge-Kutta  fourth  order  integration 

method.  The  Runge-Kutta  integration  scheme  was  used  both 

in  forward  integration  of  the  state  equations  (4.17)  and 

the  backward  integration  of  DDP  equations  (4.50). 

Refer  to  Table  10,  for  v^(t)  =  0.25  and  u^^^ (t)  =  -0.25, 

the  cost  is  =  .712.  The  convergence  control  weighting 

factor  W  =  1.  After  integrating  (4.50)  back  to  t  =  0,  the 

predicted  cost  change  a(0)  =  -.013.  Using  the  new  control, 

u* (t)  found  in  the  process  of  backward  integration,  the  new 

cost  was  evaluated  and  the  cost  change  A  J  «  -.194.  This 

process  was  repeated  until  a(0)  is  smaller  than  .001 

1+W 
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After  eight  iterations ,  the  DDP  routine  converged,  the 
latest  nominal  control  is  the  local  minimizing  control, 

(t)  plotted  in  Figure  8b. 

The  second  local  minimizing  control,  (t)  also 

shown  in  Figure  8b  was  found  in  a  similar  manner  using 

(2) 

v^Ct)  =  0.25  and  u^tt)  =  -0.25  as  starting  nominal  controls 

(2) 

With  these  controls,  J  '  =  .878.  Again  using  W  =  1  the 
predicted  cost  change  was  found  to  be  -.0005  while  the 
actual  cost  change  was  -.018,  the  minimizing  control  in 
this  iteration  was  accepted  as  the  new  nominal  control  and 
so  on.  DDP  routine  in  this  case  converged  in  seventeen 
iterations. 

At  the  two  local  minimizing  controls,  the  function  - 
space  gradient  of  mincost  with  respect  to  the  maximizing 
control  is  found  by  the  following  equation: 

g(t)  =  -2Q2v(t)  -  2xQ{t)  . 

The  norm  g^^^)  and  the  inner  product  (g^^^g^^^) 

was  found  by  using 

g^(t)dt  .  (4.54 

^  0 

Using  the  gradient:  g^^^t)  and  g^^^  (t) ,  the  norm: 
(g|^Nt),  g^^^  (t) )  f  and  the  inner-product:  (g^,  (t)  ,gj^  (t) ) 

the  step  length  was  calculated.  The  logic  used  to  find  the 
step  length  for  this  problem  is  to  alter  the  maximizing 


control  v^Ct)  in  such  a  way  that  appreciable  increase  of  the 
mincost  is  found  and  yet  the  minimizing  control  use  in 

the  last  iteration  will  not  be  too  far  off  from  the  new  one. 
In  addition,  we  do  not  want  the  new  mincost  to  be 

greater  than  the  new  mincost  J  .  Experimentation  shows 
that  the  step  chosen  to  realize  a  predicted  change  in  min¬ 
cost  of  20%  worked  very  well  in  most  cases.  However,  the 
step  size  limited  to  a  maximum  value  of  one.  The  new 
approximation  to  the  maxmin  control  is  then  found  by 

Vj^Ct)  =  .  (4.54) 

This  is  plotted  in  Figure  8a. 

The  first  new  local  min  for  Vj^{t)  was  reached  in  three 
iterations  of  the  DDF  routine,  while  the  second  one  was 
reached  in  one  iterations.  Examination  of  the  mincosts 
reveals  that  they  are  approaching  one  another.  Thus  we  may 
suspect  that  a  crossover  point  may  be  found.  After  three 
outer  approximations  to  the  maxmin  control,  the  crossover 
point  was  indeed  found,  the  value  of  the  controls  at  this 
crossover  point  is  shown  in  Table  11. 

In  figure  9,  pertinent  state  information  are  presented 
for  the  control  combination  v(t)  and  u^^^ (t)  or  the  maxmin 
control  combination  number  1.  The  standard  deviation  for 
the  Target  Miss  is  plotted  as  ;  this  curve  shows 

that  the  probability  density  function  of  the  projected 
Target  Miss  first  expands  because  of  the  positive  value  of 
u^l)  (t)  up  till  the  time  t  =  3.25  seconds,  then  the  negative 


^  ( 1 ) 

value  of  u '  ' (t)  causes  the  same  process  probability  density 
function  to  continually  contract  and  end  up  with 
0.3  kilofeet,  a  significant  drop  from  the  starting  value  of 
1.5  kilofeet.  The  standard  deviation  for  the  projected 
intercept  miss  is  plotted  as  .  This  curve  shows  that 

the  probability  density  function  of  the  projected  intercept 
miss  first  expands  because  U  is  using  greater  positive 
control  than  V.  However,  around  t  =  3  seconds,  V  start  to 
apply  greater  positive  control  while  U  control  becomes 
negative,  the  probability  density  function  contracts 
rapidly,  the  players  are  moving  towards  one  another.  The 
curve  goes  through  a  minimum  and  start  rising  again,  this 
in  effect  tells  us  that  V  has  overshot  the  inside  maneuver¬ 
ing  attacking  missle  after  t  -  7  seconds. 

Figure  10  presents  the  maxmin  effective  lateral  veloci¬ 
ties  for  each  player  computed  from  the  following  equations: 

c^{t)  =  u^(t)  x,(t)  . (4.55a) 

u  ^ 

c^(t)  =  v^(t)  x^(t)  . (4.55b) 

These  curves  show  that  both  quantities  are  well  within 
the  lateral  velocities  limit  of  1.5  kilofeet/second  for  U 
and  .75  kilofeet/second  for  V.  The  important  point  here 
is  that  U  is  much  bolder  for  the  maxmin  game  than  V  since 
in  this  situation,  it  is  V  who  must  play  his  "security 
level"  control  and  guard  against  any  possibility  that  U 
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might  come  up  with. 

Figure  11  and  12  presents  the  same  information  as 
Figure  9  and  10  respectively,  only  this  time  with  the  con¬ 
trol  combination  of  v(t)  and  u^^^  (t) .  The  negative  feed¬ 
back  causes  both  standard  deviations  to  drop  initially 
until  t  =  6.6  seconds.  After  which  time  increases 

becauses  of  positive  value  of  u^^^  (t)  and  is  level 

off  a  little  because  now  U  is  trying  to  "get  away"  from  V 
rather  than  just  "bearing  down"  on  the  target.  From  Figure 
12  V  is  again  much  more  conservative  on  the  lateral  velo¬ 
city  than  U. 

Figure  13  shows  sample  trajectories  with  control 
combinations  v(t),  u^^^ (t)  for  Figure  13a,  and  vCt) ,u^^^  (t) 
for  Figure  13b.  These  trajectories  are  generated  by  equa¬ 
tions  (4.8)  using  ffy(O)  and  O'y(O)  as  the  initial  values 
for  the  random  variables  Xy  and  x^  respectively.  Clearly, 
the  control  combination  v(t) ,  u'  ' (t)  represent  the  case 
where  U  initially  maneuvering  away  from  the  target  to 
draw  V  out,  then  using  his  superior  control  capability  to 
maneuver  inside.  Whereas  the  control  combination  v(t)  and 
u^^^ (t)  represent  the  case  where  U  first  tries  to  bear  down 
on  the  target  and  then  uses  his  superior  capability  to 
maneuver  outside  to  get  away  from  V  at  the  line  of  inter¬ 
cept. 

4.4.3  Minmax  Solution 

As  before,  we  need  a  nonimal  u  (t)  to  start  the 


algorithm  for  the  minmax  solution.  Again  we  shall  derive 
this  nominal  control  by  assuming  v^Ct)  =  0  for  O^t  ^  T. 
With  U  using  this  control  u^(t),  V  would  be  forced  to  do 
"something"  to  try  maximize  the  cost  functional. 

Proceeding  in  the  same  manner  as  in  the  maxmin  case, 
it  is  found  that  with  Vo(t)  =  0,  u^Ct)  =  constant.  With 
this  fact  in  mind  the  state  and  costate  equations  can  be 
solved  as  before  and  substitue  in 


Uq  =  [2x1  (t)  Xq  (t) +kXj^  (t)  Xi(t)+X2(t)  X  1  (t) +2kx2  (t)X2  (t)  ] 

720^^  (4.56) 


The  resulting  transcendental  equation,  even  though  more 
complicated,  possesses  the  same  structure  as  equation  (4.47). 
Numerically  solving  this  transcendental  equation,  Uq=-0.125 
is  used  as  the  starting  nominal  control  for  the  minmax 
algorithm. 

The  next  step  is  to  find  all  the  local  maxima  of 
J(Uq,v)  using  the  DDP  routine  whose  pertinent  equations 
are  as  follows: 


II 

* 

> 

(2Wv^-2x^J^^  - 

• 

*  2 

act)  = 

Q2 (V  -  v^) 

• 

* 

J  tt)  = 
3^0 

2v  J„ 

Xo 

• 

* 

Jx  = 

1 

-2u  J  “(ku  -V  )J 

w  Xq  w  Xi 

j  (t)  » 

-u  (J„  +2kJ  ) 

X2 

o  XI  Xj 

. (4.57) 
(4.58a) 
(4.58b) 
(4.58c) 
(4.58d) 
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Equation  (4.57)  is  used  for  |y  I41.  If  Iv  I  >  1  then  set 

lv*l  '  1  while  preserving  the  same  sign  and  use  equation 

C4.52)  instead  of  equation  (4.58a).  Incidentally,  the 

terminal  conditions  on  equations  (4.58)  are  the  same  as 

equations  (4.51).  Successful  backward  integration  of 

* 

equations  (4.58)  while  solving  for  v  in  (4.57)  depends 
upon  the  proper  value  of  the  weighting  factor  W  for  the 
convergence  control.  For  the  problem  at  hand  DDP  routine 
converges  to  one  maximum  value  from  a  wide  selections  of 
starting  nominal  control  v^(t) .  Intuitively,  once  U*s 
control  is  specified,  then  his  trajectory  is  predictable 
from  equation  (4.8a) .  Knowing  U's  trajectory  then  V  can 
aim  at  the  position  on  the  interception  line  and  just 
minimizes  the  intercept  miss  which  in  turn  maximizes  the 
probability  of  survival  of  the  target.  It  is  reasonable  to 
expect  a  unique  optimal  control  for  V  to  achieve  this 
objective. 

Table  12  summarizes  the  numerical  results  for  the 
roinmax  iterations.  For  Uq  =  -0.125  and  Vq  *  0.125,  the 
cost  J  =  .758,  the  predicted  cost  change  is  .003  while  the 
actual  cost  change  is  0.145.  The  new  cost  J  is  .903  and 
V*  is  accepted  as  v^  for  the  next  iteration  and  so  on.  For 
the  current  Uq,  DDP  routine  converges  in  seven  iterations. 

The  maxcost  gradient  designated  as  is  then  computed 
for  the  control  combination  Uq,  v*.  The  norm  of  this  max- 
cost  gradient  is  computed  from 


Table  12 


Minmax  Iteration 


MIN 

MAX 

J 

W 

a(0) 

r 

A  J 

1 

m 

.758 

MSM 

2.98E-3 

0.145 

.903 

3.30E-3 

5.00E-2 

.953 

■S9 

4.40E-4 

3.00E-2 

.956 

1.50E-4 

l.OOE-4 

.956 

mgs^M 

3.10E-4 

2.90E-3 

.959 

0.59 

7.00E-5 

8.00E-5 

.959 

0.53 

1.80E-4 

l.OOE-3 

H 

.960 

=  0.0119,  STEP 

=  1.00 

''l 

D 

.822 

1.00 

2.67E-3 

( 

5.40E-2 

.936 

2.00E-A 

2.00E-3 

.938 

1.46 

7.00E-5 

l.OOE-3 

H 

.939 

(gifg^) 

=  0.1019,  STEP 

=  1.00 

m 

m 

— 

■ss 

9.00E-5 

1.60E-3 

■1 

KlfiB 

8.00E-5 

1.95E-3 

8.00E-5 

l.OOE-3 

.765 

0.729 

7.00E~5 

l.OOE-3 

.766 

0.656 

6.00E-5 

l.OOE-3 

■ 

^2 

.766 

i<32'^2^ 

=  .0075, 

MINMAX  SOLUTION 

The  step  length  is  computed  using  the  same  logic  as  in  the 
maxmin  case.  The  new  nominal  control  combination  is 


Uj^Ctl  =  Uo(t)  -  step.g^(t)  . (4.60a) 

V,  (t)  =  v*(t)  . (4.60b) 

X  o 


The  process  is  then  repeated.  After  two  approximations  for 
outer  minimization,  the  norm  of  the  maxcost  gradient  is 
negligibly  small  and  the  minmax  solution  is  found  with 

u(t)  =  U2(t)  . (4.61) 

The  value  of  u(t)  is  shown  in  table  13  for  sixty-four  incre¬ 
mented  time  scale.  The  successively  improved  approximations 
to  the  minmax  control  are  plotted  in  Figure  14a,  while  the 
corresponding  maximizing  controls  are  plotted  in  Figure  14b. 

Figure  15  presents  pertinent  state  information  by 
showing  standard  deviation  for  projected  intercept  and  tar¬ 
get  miss.  The  probability  density  function  for  the  target 
miss  initially  drops  sharply  as  shown  by  •'/x^  caused  by  the 
fact  that  the  players  (using  minmax  control  combination)  are 
moving  towards  one  another  in  this  time  interval.  The 
positive  control  u(t)  for  4.4  it  ^8  not  only  slow  down  the 
rate  of  decrease  of  the  intercept  miss  but  also  increase 
the  target  miss  as  U  is  moving  away  from  the  LOS. 

Figure  16  shows  the  reversal  of  roles  between  U  and  V. 


Successive  Approximations  to 
Minmax  Control 


i  WBuaMfMHaaiHMiii 

!SS);!Bsass!8Bus:R:niKi8:! 

nil  iint^HiiiUHfiueniiiiMBBii 


J.U  L  1. 


Figure  15.  Minmax  Standard  Deviation  for 
Projected  Intercept  and  Target 
Miss 
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Figure  16.  Minmax  Effective  Lateral  Velocities 
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In  the  iDaxmin  solution  U  was  clearly  the  more  agressive 
player,  while  in  this  case  for  the  minmax  solution  U  has 
to  protect  himself  against  all  possibilities  and  becomes 
much  more  conservative  than  V. 

This  fact  is  also  confirmed  in  Figure  17  which  is  a 
sample  trajectory  for  minmax  solution.  U  stayed  pretty 
close  to  his  initial  lateral  position  while  V  is  consider¬ 
ably  bolder  in  going  out  to  meet  the  attacking  missle.  For 
the  minmax  case,  the  players  pass  the  intercept  line  with 
greater  lateral  distance  away  from  the  target  than  in  the 
maxmin  case. 

4.4.4  Net  Solution 

The  net  solution  of  the  game  consists  of  the  minmax 

A 

control  u(t)  for  the  attacking  missle  player  U  and  the 
maxmin  control  v(t)  for  the  intercepting  missle  player  V. 
Using  these  strategies  neither  player  needs  to  assume  any 
prior  knowledge  on  what  the  other  player  might  do.  The 
inequalities  (2.7)  also  assures  that  both  players  will  bene- 
benefit  by  using  these  "secure  strategies”. 

A  ^ 

The  net  cost  of  the  game  J(u,v)  is  0.756,  an  improve¬ 
ment  for  U  for  the  minmeix  cost  of  0.766,  and  an  even  more 
improvement  for  V  for  the  maxmin  cost  of  0.659. 

Figure  18  shows  a  sample  trajectory  using  the  control 
combination  u(t}  and  v(t)  generated  by  equations  (4.8)  using 
standard  deviations  ffy(O)  and  O^y(O)  for  the  initial  values 
Xy(0)  and  x^(0)  respectively. 


A42 


4.5  Discussion  of  the  Problem 

This  problem  has  raised  several  significant  points 
about  general  nature  of  differential  games  and  the  relative 
merit  of  the  methods  required  to  solve  them.  These  points 
will  be  discussed  in  this  section.  First  the  results  of 
this  problem  confirm  that  a  saddle  point  is  not  required  to 
exist  in  a  general  two-person  zero-sum  differential  game. 

In  this  case,  the  maxmin  solution  is  a  crossover  point.  It 
must  be  emphasized,  however,  that  a  crossover  point  can  be 
used  as  a  solution  only  in  the  case  where  the  values  of 
the  local  optimal  points  in  the  inner  optimization  are 
moving  towards  one  another  after  the  improved  successive 
approximations  of  the  outer  optimization  process.  Only  a 
very  limited  class  of  differential  game  can  be  shown  to 
possess  a  saddle  point.  One  such  problem  is  a  linear 
dynamics,  quadratic  cost,  two-person,  zero  sum  determinis¬ 
tic  differential  game. 

The  seemingly  simple  problem  described  in  this  chapter 
with  several  simplified  assumptions  has  turned  out  to  be  a 
complicated  problem  with  nonlinear  dynamics  and  nonquadra¬ 
tic  cost.  Even  the  initial  guess  of  the  initial  nominal 
control  solving  as  an  optimal  control  problem  cannot  be 
treated  analytically.  The  only  way  out  was  to  use  an 
efficient  algorithm  to  solve  the  problem  numerically.  The 
use  of  a  computer  is  inevitable 

145 


4 


The  defensive  aspect  of  using  the  minmax  and  the  max- 
min  solutions  or  the  so  called  "security  level"  solutions 
xaust  be  mentioned.  With  these  strategies/  each  player  can 
rest  assure  that  his  opponent  can  induce  no  more  harm  to 
him  than  what  he  expects.  He  can  only  gain  if  the  opponent 
decides  to  switch  to  some  other  strategy.  One  traditional 
way  to  solve  an  intercept  problem  was  to  derive  a  likely 
strategy  for  the  attacking  missle.  Many  authors  have  used 
colored>noise  process  to  describe  the  behavior  of  the 
attacking  missle  and  solved  for  the  interception  strategy 
as  an  optimal  control  problem.  It  is  unreasonable,  however 
to  expect  that  the  attacking  missle  will  oblige  in  the 
actual  case  and  behave  like  a  colored-noise  process  espec¬ 
ially  if  he  knows  that  the  interception  strategy  has  been 
derived  in  such  manner.  Some  authors  suggested  "mixed 
strategy"  as  the  solution  of  a  differential  game  where  a 
saddle-point  does  not  exist.  However,  the  "mixed  strategy" 
solution  is  very  hard  to  compute  even  for  a  very  very  sim¬ 
ple  unrealistic  problem.  Implementation  of  such  strategy 
in  an  actual  combat  encounter  does  seem  to  be  too  far 
fetched. 

Sufficient  statistics  are  used  as  states  variables  for 
this  problem  to  obtain  meaningful  results.  This  method 
greatly  reduced  the  complexity  of  stochastic  differential 
game.  However,  the  number  of  state  variables  are  greater 
than  the  deterministic  cases.  The  current  pursuit-evasion 


problem  requires  two  physical  state  variables,  the  actual 
sufficient  statistics  is  five:  two  mean  value  functions 
and  three  independent  covariance  matrix  elements.  If  the 
physical  controls  are  accelerations  rather  than  velocities, 
four  physical  states  variables  would  be  required  leading  to 
fourteen  elements  in  the  set  of  sufficient  statistics,  and 
so  on.  To  keep  the  problem  tractable  and  retain  physical 
insight,  the  problem  should  be  simplified  as  much  as  real¬ 
istically  possible. 

One  reason  that  causes  differential  games  to  be  much 
more  complicated  than  simply  being  an  extension  of  optimal 
control  problems  is  the  existence  of  conjugate  points. 

The  main  reason  conjugate  points  appear  in  differential 
games  as  a  rule  rather  than  an  exception  is  that  one  must 
simultaneously  maximizing  and  minimizing  the  same  cost 
function.  The  first  order  algorithm  developed  for  this 
report  at  first  does  not  converge  on  a  wide  varieties  of 
starting  initial  conditions  even  with  the  use  of  the  "step 
-size”  convergence  control  discussed  in  Chapter  2  which 
was  used  successfully  in  McFarland  algorithm.  Jarmark's 
scheme  for  convergence  control  is  then  used  with  excellent 
results.  Singular  control  becomes  non-singular  with  the 
proper  convergence  control  weight  described  in  section  4.3. 

The  formulation  of  the  stochastic  nonlinear  pursuit- 
evasion  problem  in  this  chapter  turns  out  to  be  the  same 
model  that  McFarland  used  in  his  report.  However,  v 


different  rationalization  was  made  in  the  fo3nnulation  pro¬ 
cess.  It  is  believed  that  the  rationalization  used  here 
is  more  realistic  as  explained  in  section  4.2.  Since  the 
same  model  was  arrived  at  except  for  the  addition  of  the 
control  constraints  here,  comparison  of  the  computation 
results  could  be  made.  Since  McFarland  used  second  order 
algorithm  for  inner  optimization  and  also  computed  a  new 
estimate  for  the  inner  optimal  control  for  each  successive 
outer  approximation,  the  amount  of  computations  required 
for  each  iteration  of  McFarland's  DDP  is  about  four  times 
the  eunount  of  computations  required  for  each  iteration 
used  in  this  report.  On  this  basis,  the  amount  of  comput¬ 
ations  required  to  reach  the  maxmin  solution  in  this  report 
is  about  one-half  the  amount  required  by  McFarland’s 
algorithm.  The  miiunax  computation  requirement  is  even 
more  impressive,  only  6.35  seconds  is  required  for  the 
computer  execution  time  while  less  than  one-sixth  the 
amount  of  computation  is  required  when  compared  to 
McFarland's  method  to  reach  the  same  solution.  Therefore, 
the  algorithm  presented  in  this  report  is  more  suitable 
to  the  real  time  application  of  pursuit  and  evasion 
differential  game. 


CHAPTER  5 


CONCLUSIONS  AND  RECOMMENDATIONS 

5.1  Conclusions 

This  research  deals  mainly  with  the  most  natural 
applications  of  differential  games:  namely  the  pursuit- 
evasion  problems.  Except  for  a  very  few  simple  unrealistic 
problems  of  this  type,  analytical  solutions  are  virtually 
impossible  to  obtain.  Fast  and  efficient  algorithm  is 
needed  before  the  solutions  of  realistic  pursuit-evasion 
problems  can  be  solved  and  implemented  in  the  actual  physi¬ 
cal  conditions. 

A  nvimerical  first  order  method  without  a  saddle  point 
assumption  and  capable  of  handling  the  control  and  state 
constraints  is  developed  in  this  report.  The  algorithm  is 
used  to  solve  for  the  minmax  and  the  maxmin  solutions 
independently  to  be  used  by  each  player. 

Linear  Quadratic  differential  game  without  any  cons¬ 
traint  can  be  solved  analytically  even  without  a  saddle 
point  assumption.  The  analytical  solutions  are  offered  in 
this  case.  The  case  with  limiting  control  constraint 
cannot  be  solved  analytically.  The  numerical  methods  with 
and  without  the  saddle  point  assumption  give  the  answer 
for  this  case  with  negligible  difference  in  computation 
time. 
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A  nonlinear  stochastic  pursuit-evasion  problem  is 
developed  and  treated  as  a  deterministic  problem  via  a  set 
of  sufficient  statistics.  This  problem  does  not  converge 
with  the  algorithm  developed  in  this  report  without  an 
efficient  convergence  control  method.  The  step-size 
convergence  control  is  not  adequate  for  this  problem. 
Jarmark's  convergence  control  method  solves  this  problem 
when  used  in  conjunction  with  the  first-order  algorithm 
developed  here.  Together, '  the  algorithm  provide  a  fast 
computation  time  for  the  problem.  The  results  of  this 
problem  also  strenghten  the  claim  that  saddle  point  does 
not  exist  in  a  general  differential  game  contrary  to  the 
assumption  used  by  many  authors. 

On  the  computation  details,  we  found  that  the  simple 
Euler  integration  scheme  is  sufficient  when  used  in  a 
simple  linear-quadratic  problem  presented  in  Chapter  3. 

In  the  nonlinear  stochastic  case  of  Chapter  4,  however, 
more  accurate  integration  scheme  is  needed.  The  fourth- 
order  Rvinge-Kutta  Integration  scheme  is  used  in  this 
latter  case  with  excellent  results. 


5.2  Recommendations  for  Future  Research 


The  first  faslnating  area  that  could  be  further  ex¬ 
plored  in  this  field  is  the  improvement  of  convergence 


control  method.  Jarmark's  convergence  control  technique 
may  be  applied  to  the  second  order  algorithm  developed 
by  McFarland  and  compared  with  the  "step-size"  convergence 


control  of  Jacobson  and  Mayne.  So  far  Jarmark  has  provided 
only  the  existence  Theorems  of  the  convergence  control 
weighting  matrix  W.  There  is  no  hard  and  fast  rule  on  how 
to  find  the  most  efficient  W.  This  author  feels  that  the 
convergence  index  domain  used  in  this  report  has  been 
started  off  in  the  right  direction.  There  could  be  room 
for  improvement  in  this  area. 

Another  unexplored  area  is  to  let  W  be  time  variant. 
This  would  probably  be  suitable  for  the  algorithm  developed 
in  this  report.  In  the  backward  integration  of  the  DDP 
equations,  it  does  seem  that  the  optimal  control  generated 
near  the  final  time  T  should  be  more  accurate  than  that 
generated  further  out  near  the  starting  time  0.  Therefore, 
we  could  for  example  let  W  be  a  linear  function  of  time 
with  appropriate  slope  and  initial  value  to  be  found  by 
some  suitable  logic. 

Time  delay  is  another  intriging  area  that  must  be 
faced  in  the  real  world.  Presently,  a  few  authors  has 
dealt  with  this  subject.  All  of  them  quickly  specialize 
into  simple  problem.  It  would  be  interesting  and  benefi¬ 
cial  to  see  how  the  information  time  delay  would  effect 
the  optimal  strategies  of  the  problem  presented  in  this 
report. 


